|
发布时间: 2018-06-16 |
医学图像处理 |
|
|
收稿日期: 2017-10-26; 修回日期: 2017-12-05
基金项目: 国家自然科学基金项目(61272373,61202254,71303031);辽宁省自然科学基金项目(201602195,DC201502030202);中央高校自主科研基金项目(DC13010313,DC201502030202);辽宁省博士科研启动基金项目(201601084)
第一作者简介:
于玉海(1980-), 讲师, 2018年大连理工大学在读计算机软件与理论专业博士研究生, 主要研究方向为卷积神经网络和生物医学图像处理。E-mail:yuyh@dlnu.edu.cn.
中图法分类号: TP391.6
文献标识码: A
文章编号: 1006-8961(2018)06-0917-00
|
摘要
目的 生物医学文献中的图像经常是包含多种模式的复合图像,自动标注其类别,将有助于提高图像检索的性能,辅助医学研究或教学。方法 融合图像内容和说明文本两种模态的信息,分别搭建基于深度卷积神经网络的多标签分类模型。视觉分类模型借用自然图像和单标签的生物医学简单图像,实现异质迁移学习和同质迁移学习,捕获通用领域的一般特征和生物医学领域的专有特征,而文本分类模型利用生物医学简单图像的说明文本,实现同质迁移学习。然后,采用分段式融合策略,结合两种模态模型输出的结果,识别多标签医学图像的相关模式。结果 本文提出的跨模态多标签分类算法,在ImageCLEF2016生物医学图像多标签分类任务数据集上展开实验。基于图像内容的混合迁移学习方法,比仅采用异质迁移学习的方法,具有更低的汉明损失和更高的宏平均F1值。文本分类模型引入同质迁移学习后,能够明显提高标签的分类性能。最后,融合两种模态的多标签分类模型,获得与评测任务最佳成绩相近的汉明损失,而宏平均F1值从0.320上升到0.488,提高了约52.5%。结论 实验结果表明,跨模态生物医学图像多标签分类算法,融合图像内容和说明文本,引入同质和异质数据进行迁移学习,缓解生物医学图像领域标注数据规模小且标签分布不均衡的问题,能够更有效地识别复合医学图像中的模式信息,进而提高图像检索性能。
关键词
多标签分类; 卷积神经网络; 迁移学习; 生物医学图像; 深度学习
Abstract
Objective The amount of biomedical literature in electronic format has increased considerably with the development of the Internet. PubMed comprises more than 27 million citations for biomedical literature linking to full-text content from PubMed Central and publisher web sites. The figures in these biomedical studies can be retrieved through tools along with the full text. However, the lack of associated metadata, apart from the captions, hinders the fulfillment of richer information requirements of biomedical researchers and educators. The modality of a figure is an extremely useful type of metadata. Therefore, biomedical modality classification is an important primary step that can aid users to access required biomedical images and further improve the performance of the literature retrieval system. Many images in the biomedical literature (more than 40%) are compound figures including several subfigures with various biomedical modalities, such as computerized tomography, X-ray, or generic biomedical illustrations. The subfigures in one compound figure may describe one medical problem in several views and have strong semantic correlation with each other. Thus, these figures are valuable to biomedical research and education. The standard approach to modality recognition from biomedical compound figure first detects whether the figure is compound or not. If it is compound, then a figure separation algorithm is first invoked to split it into its constituent subfigures. Then, another multi-class classifier is used to predict the modality of each subfigure. Nevertheless, the figure separation algorithms are not perfect, and the errors in figure separation propagate to the multi-class model for modality classification. Recently, some multi-label learning models use pre-trained convolutional neural networks to extract high-level features to recognize the image modalities from the compound figures. These deep learning methods learn more expressive representations of image data. However, convolutional neural networks may be hindered to disentangle the factors of variation by the limited samples with high variability and the imbalanced label distribution of training data. A new cross-modal multi-label classification model using convolutional neural networks based on hybrid transfer learning is presented to learn biomedical modality information from the compound figure without separating it into subfigures. Method An end-to-end training and multi-label classification method, which does not require additional classifiers, is proposed. Building two convolutional neural networks enables to learn the components of an image without learning from single separated subfigure that represents the image modalities, but from labeled compound figures and their captions. The proposed cross-modal model learns general domain features from large-scale nature images and more special biomedical domain features from the simple figures and their captions in biomedical literature, leveraging techniques of heterogeneous and homogeneous transfer learning. Specifically, the proposed visual convolutional neural network (CNN) is pre-trained on a large auxiliary dataset, which contains approximately 1.2 million labeled training images of 1000 classes. Then, the top layer of the deep CNN is trained from scratch on single-label simple biomedical figures to achieve homogeneous transfer learning. The key point of such transfer learning is fine-tuning the pre-trained deep visual models on the current multi-label compound figure dataset. The architecture of the deep visual models should be changed slightly and then they could be fine-tuned on the current dataset. On the other hand, the weights of the embedding layer are initialized by the word vectors, which are pre-trained on captions extracted from 300 000 biomedical articles in PubMed, and are updated while training the networks. Similar to the homogeneous transfer learning strategy of visual model, the proposed textual convolutional neural networks are first pre-trained on the captions of the simple biomedical figures. Then, the pre-trained textual model is fine-tuned on current multi-label compound figures to capture more biomedical features. Finally, cross-modal multi-label learning model combines outputs of the visual and textual models to predict labels using multi-stage fusion strategy. Result The proposed cross-modal multi-label classification model based on hybrid transfer learning is evaluated on the dataset of the multi-label classification task in ImageCLEF2016. Our approach is evaluated based on multi-label classification Hamming Loss and Macro F1 Score, according to the evaluation criterion of the benchmark. The two comparative models learn multi-label information only from visual content. They pre-train AlexNet on large-scale nature images. Then, the DeCAF features are extracted from the pre-trained AlexNet and fed into the SVM classifier with a linear kernel. One comparative model predicts modalities by the highest score of SVM and the other model predicts by the highest posterior probability. The visual model achieves 33.9% lower Hamming Loss and 100.3% higher Macro F1 Score by introducing homogeneous transfer learning technique, and the textual model efficiently improves the performance in the two metrics. Thus, the proposed cross-modal model can achieve similar Hamming Loss of 0.0157 with the state-of-the-art model and obtain 52.5% higher Macro F1 Score, which is increased from 0.320 to 0.488. Conclusion A new method to extract biomedical modalities from the compound figures is proposed. The proposed models obtain more competitive results than the other reported methods in the literature. The proposed cross-modal model exhibits acceptable generalization capability and could achieve higher performance. The results imply that the homogeneous transfer learning method can aid deep convolutional neural networks (DCNNs) to capture a larger number of biomedical domain features and improve the performance of multi-label classification. The proposed cross-modal model addresses the problems of overfitting and imbalanced dataset and effectively recognizes modalities from biomedical compound figures based on visual content and textual information. In the future, building DCNNs and training networks with new techniques could further improve the proposed method.
Key words
multi-label learning; convolutional neural network; transfer learning; medical image; deep learning
0 引言
剧烈增长的生物医学文献[1],包含大量的医学复合图像[2]。复合图像的子图往往属于多种生物医学模式,对于同一个医学问题提供不同视角,这些子图并不是随机选择的,它们之间都会有一种清晰的语义关联[3],对于学术研究和教学具有很大的参考价值。然而,子图模式的不确定性和组合方式的差异性,给复合图像的检索和理解带来巨大的困难。一种可行的途径是,首先将其分离成若干单标签子图[4-6],随后应用生物医学模式分类算法进行识别[7-9]。这种方法的有效性依赖于两个子模型的性能,即分离模型处理能力和子图模式分类模型识别效果。近年来,研究者尝试创建多标签分类模型[3, 10-11],直接从原始复合图像中提取医学模式信息,作为检索和理解图像的基础。
同通用领域多标签分类任务类似,我们发现医学图像多标签分类模型,主要面临数据集标签分布不均衡、标注数据集规模过小和标签依赖等三方面的挑战。首先,医学文献作者论证某个医学问题时,习惯将多种模式的语义相关的图像组合在一起生成复合图像[3],不同尺寸的子图可能分布各种位置,而且每个标签对应复合图像实例数量并不均衡,因此,多标签比单标签医学图像分类任务更加复杂。其次,规模过小的数据集,面临庞大的模型参数规模,很容易导致过拟合问题,然后,数据集标注成本十分巨大,人工标注百万级的医学图像数据集,将会耗费巨大的人力物力。最后,相互依赖的标签集导致巨大的预测空间,标签子集数量随类标签数量呈指数级递增。例如,ImageCLEF生物医学图像数据包含30个类标签,这意味着标签空间具有超过十亿(即230)的规模。许多标签集在训练集中很少出现,如果单独学习不同的标签集,分类器性能将会很差。
鉴于深度卷积神经网络(CNN)在自然图像分类领域[12]获得巨大成就[13-16],学者们已经尝试使用在自然图像上预训练的CNN,构建医学图像的多标签分类模型,并取得不错的成绩[3, 10-11]。例如在ImageCLEF2016多标签分类任务中,Kumar等人[11]使用预训练的AlexNet[13]抽取特征,训练一个多类SVM分类器,预测标签的汉明损失低至0.013 1。然而,由于标签分布严重不均衡,汉明损失评价指标,无法获知分类器是否过拟合某些占多数的类。尽管Kumar等人[11]的多标签分类模型,获得较低误分类次数,但是它的F1值仅为0.320,表明其精度或召回率还需要进一步提升。
针对所述挑战,一方面,本文构建基于混合迁移学习的卷积神经网络,进行多标签分类,也就是说,通过异质迁移学习克服标注数据集过小导致的过拟合问题,以及采用同质迁移学习削弱不均衡数据集造成的负面影响;另一方面,假定当前图像与每一个标签都相关,预测所有标签关联概率,通过标签分步标定算法缩小预测空间,最终确定关联标签集。
本文方法在ImageCLEF2016多标签分类数据集上开展实验,获得与最先进算法相近的汉明损失值,而F1值提升52.5%,表明本文方法在生物医学文献的图像多标签分类领域具有一定的潜力。
1 相关工作
分类问题是机器学习主要任务之一,即给定一组包含特征和类别的训练样本,分类的目标就是获得一个模型能够将正确的类别(标签)赋给未知样本。这样分类问题的描述,背后隐含的限制条件是,每个样本有且只有一个对应的标签。在现实生活中,很多分类场景中,样本可能对应多个标签,例如生物医学文献中的复合图像,它们通常对应多个医学图像模式,与经典的监督学习(也称为单标签分类)相比,这类问题称为多标签分类[17]。
1.1 多标签分类算法
研究者在算法设计、标签关系和阈值标定等方面展开了大量的研究工作[18]。
设计多标签分类算法时,有两种视角:一是改造流行的监督学习算法处理多标签数据。鉴于许多单标签图像分类工作[13-16],在自然图像评测平台ImageNet[12]取得显著的成绩。因此,越来越多的研究者开始尝试采用先进的深度卷积神经网络,进行多标签图像分类[11, 19-21];另一种视角是,变换多标签数据为其他学习框架,如二元分类链[22]和多类分类问题集合[23]。
多标签分类面临的挑战之一是输出空间十分庞大,标签子集的数量随标签数呈指数级增长,为了克服巨大标签空间的负面影响,传统常用的解决方法是从训练集中捕获标签相关性信息辅助学习过程[18]。根据关系的阶数,评估标签关系的技术大致分为三类:第一,“一阶”技术采取逐标签的方法,忽略其他标签是否共现。例如,将多标签问题分解为许多的二元分类问题[24-25]。一阶策略最大的优点是概念简单和效率高,但是代价是忽略标签关系造成结果未必是最优的。第二,“二阶”技术成对地考虑标签关系,处理多标签分类任务。例如,对相关和不相关标签进行排序[17, 26]。由于考虑了标签关系,二阶策略在一定程度上提高泛化性能。然而,现实世界中,标签关系并不总是成对出现的。第三,“高阶”技术考虑标签之间的高阶关系进行多标签分类。例如假定所有标签相关[22]或随机产生标签子集[23]。显然,高阶策略对标签关系具有更强建模能力,但要付出更高的计算代价。
目前流行的基于卷积神经网络多标签分类算法,假定当前图像样本与所有标签关联,从训练集中学习标签间的依赖关系,对于每个复合图像分为相关与不相关标签集。
多标签分类模型,如深度卷积神经网络,往往返回一组实值结果
阈值标定函数通常可以通过两种策略实现,一种是使用常量标定标签,例如流行的选择是0.5[10, 27]或者通过最小化训练集标签集和预测标签集的差异确定标定常量[22];另一种是采用stacking-style过程从训练集中学习标定函数[28]。然而,现实生活中,由于已标注的医学图像数据集规模过小,如果标注的训练集与测试集标签分布不一致,将无法发挥动态阈值的优势。
1.2 迁移学习在多标签分类任务中的应用
在生物医学图像领域,研究者开始应用卷积神经网络解决多标签分类问题,例如间质性肺病类型识别[29]等。深度卷积神经网络的参数量十分庞大,而当前领域多标签标注样本稀少,重新标注百万级以上的语料费时且价格昂贵,因此为了缓解过拟合问题,迁移学习是一种不错的选择[11, 19-21]。Kumar等人[11]从ImageNet上预训练的AlexNet[13]提取4 096维特征,通过主成分分析法PCA降维获得1 453维特征,训练多类支持向量机,对生物医学复合图像进行多标签分类。Wang等人[19]从ImageNet上预训练的VGG[14]中提取图像特征表示,与标签关系学习模型融合,对自然图像进行多标签分类。受到Yang等人[20]引入图像局部组合框辅助多标签分类思想的启发,Yu等人[21]利用ImageNet上预训练的AlexNet[13],联合全局图像和局部图像两个尺度学习特征表示,对Pascal VOC[30]的场景图像进行多标签分类。
上述迁移学习过程中,同质迁移学习的数据领域相近,例如ImageNet的自然图像和Pascal VOC的场景图像,而异质迁移学习的源领域和目标领域类别差别往往很大,例如ImageNet的自然图像和ImageCLEF的生物医学图像。据我们所知,在生物医学图像多标签分类任务中,两种迁移学习的融合少有研究。
1.3 跨模态多标签分类
2 跨模态多标签分类算法
2.1 多标签分类
沿用文献[36]的表示方法,假设
$ \begin{array}{*{20}{c}} {\mathit{\boldsymbol{T}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_n},{\mathit{\boldsymbol{Y}}_n}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{Y_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array} $ | (1) |
目标是学习一个多标签分类器
$ \mathit{\boldsymbol{h}}\left( {{x_i}} \right) = \left\{ {y\left| {f\left( {{x_i},y} \right) > t,y \in \mathit{\boldsymbol{L}}} \right.} \right\} $ | (2) |
2.2 迁移学习模型
本文的多标签迁移学习模型包括文本和视觉两个部分,如图 1所示。
2.2.1 图像模型
目前先进的极深卷积神经网络ResNet-50[16],是一种深度为50层的深度残差网络,在自然图像识别领域获得优异的成绩[12]。该网络最初设计完成多类分类任务,需要微调ResNet网络结构,适应多标签分类任务。改用二元交叉熵(Binary Cross-Entropy)损失函数,并将最后一层的
$ Sigmoid\left( x \right) = \frac{1}{{1 + \exp \left( { - x} \right)}} $ | (3) |
用来估计每一个标签的相关后验概率。
用
$ L\left( {\mathit{\boldsymbol{w}},\mathit{\boldsymbol{X}}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {l\left( {f\left( {{x_i},\mathit{\boldsymbol{w}}} \right),{{\mathit{\boldsymbol{y'}}}_i}} \right)} $ | (4) |
式中,
$ \begin{array}{*{20}{c}} {l\left( {{\mathit{\boldsymbol{y}}_i},{{\mathit{\boldsymbol{y'}}}_i}} \right) = }\\ { - \sum\limits_{j = 1}^q {\left( {{{y'}_{ij}}\log {y_{ij}} + \left( {1 - {{y'}_{ij}}} \right)\log \left( {1 - {y_{ij}}} \right)} \right)} } \end{array} $ | (5) |
式中,
训练图像模型时,采用混合迁移学习的方法:
首先,通过异质迁移学习自然图像,学习海量图像信息,缓解过拟合问题,同时使模型能够对图像的通用领域一般特征保持敏感性,例如颜色、纹理、形状等。具体来说,使用Keras搭建网络ResNet-50,加载Keras作者发布的网络权重,获得在自然图像数据集ImageNet[12]上的预训练网络。
其次,进行同质迁移学习单标签生物医学简单图像,由于生物医学复合图像包含不同模式,虽然训练集中给出标签,但是没有对应到具体的子图位置,因此,通过学习单标签图像,获得图像内容与标签的关联信息,削弱复合图像数据集中标签分布不均衡造成的负面影响。具体来说,固定预训练网络ResNet-50的大部分网络层的权重,使用ImageCLEF2013和ImageCLEF2016的单模态医学图像数据集,重新训练顶层全连接层。
最后,将两步迁移学习到的网络模型,在多标签医学图像数据上训练,进行多标签分类,预测标签的相关后验概率。
2.2.2 文本模型
在训练文本模型之前,搜集ImageCLEF2013、ImageCLEF2015和ImageCLEF2016所有文章的图像说明文本,通过工具
本文使用之前工作中使用的卷积神经网络[37],包括嵌入层、卷积层、全局最大池化层和两个全连接层。利用训练集说明文本和词向量词典构建网络的嵌入层,可以将输入文本转换为词向量形式,并缩放句子长度为
训练文本模型时,采用同质迁移学习的方法:首先,同图像模型的同质迁移学习方法类似,使用单标签生物医学简单图像的说明文本训练网络。然后,使用复合图像的说明文本微调网络权重,进行多标签分类。
2.3 跨模态标签标定算法
给定分类器输出测试集实例
$ y = \left\{ \begin{array}{l} 1\;\;\;\;\;{p_j} \ge t\\ 0\;\;\;\;{p_i} < t \end{array} \right. $ | (6) |
为当前样本标定标签时,根据图像模型输出的后验概率,将高于阈值的标签加入到相关标签集。选择阈值通常采用以下两种方法:一种是固定阈值,固定阈值标定方法,通常使用流行的阈值常量
$ t = \arg \mathop {\min }\limits_t \left| {LCard\left( \mathit{\boldsymbol{X}} \right) - \left( {\frac{1}{m}\sum\limits_{i = 1}^m {\sum\limits_{j = 1}^q {{1_{{p_j} > t}}} } } \right)} \right| $ | (7) |
式中,
$ LCard\left( \mathit{\boldsymbol{X}} \right) = \frac{1}{n}\sum\limits_{i = 1}^n {\left| {{\mathit{\boldsymbol{Y}}_i}} \right|} $ | (8) |
本文跨模态模型(如图 1所示),采用全局择优法和均值法相结合的融合方式。根据阈值函数(固定阈值0.5)和图像模型输出的后验概率标定标签;如果某样本的相关标签集为空,计算图像模型和文本模型输出的所有标签的后验概率平均值,取最大
3 实验
3.1 数据集
本文实验采用ImageCLEF2016[3]多标签分类任务数据集,图像来自于PubMed生物医学文献中的复合图像(如图 2所示)。训练集和测试集分别包含1 568和1 083幅图像,并提供对应的说明文本。本数据集标签采用ImageCLEF2013[2]模式识别任务的类别子集,即去掉复合图像类别(COMP)后的其余30类,类代码和类名称如表 1所示。
表 1
多标签分类的30个类代码
Table 1
Thirty class codes of multi-label classification
类代码 | 类名称 |
DRUS | 超声波影像 |
DRMR | 核磁共振影像 |
DRCT | 计算机化断层显像 |
DRXR | X光照相术 |
DRAN | 血管造影术 |
DRPE | 正电子发射计算机断层显像 |
DRCO | 联合多种模式影像叠加图 |
DVDM | 皮肤病影像 |
DVEN | 内窥镜显像 |
DVOR | 其他器官的影像 |
DSEE | 脑电图 |
DSEC | 心电图 |
DSEM | 肌电图 |
DMLI | 光学显微镜成像 |
DMEL | 电子显微镜成像 |
DMTR | 透射显微镜成像 |
DMFL | 荧光显微镜成像 |
D3DR | 三维重构图 |
GTAB | 表格 |
GPLI | 程序列表 |
GFIG | 统计图表 |
GSCR | 屏幕截图 |
GFLO | 流程图 |
GSYS | 系统概图 |
GGEN | 基因序列图 |
GGEL | 凝胶色谱 |
GCHE | 化学结构图 |
GMAT | 数学公式 |
GNCP | 非临床照片 |
GHDR | 手绘草图 |
单标签医学图像数据,来自于其他两个医学图像处理任务,即ImageCLEF2013的模式分类任务和ImageCLEF2016的子图模式分类任务。前者训练集和测试集分别包含1 796和1 568个样本(不包含COMP模式),而后者包含6 676和4 166个样本。
3.2 实验设置
3.3 数据预处理
装载图像时,将其尺寸修改为224×224,使用Keras预处理工具转换为4维张量,采用Channel-First模式,即
为了提高词向量的解释性,尽可能多地搜集生物医学图像说明文本。除ImageCLEF2016训练集和测试集提供的说明文本,从ImageCLEF2013的30万医学文献中抽取所有插图的说明文本。经过序列化处理后,使用
3.4 评价指标
评价指标分为基于实例和基于标签两种,沿用文献[3, 10]评价标准,选用两种基于实例的评价指标。假设测试集为
$ \begin{array}{*{20}{c}} {\mathit{\boldsymbol{S}} = \left\{ {\left( {{x_1},{\mathit{\boldsymbol{Y}}_1}} \right),\left( {{x_2},{\mathit{\boldsymbol{Y}}_2}} \right), \cdots ,} \right.}\\ {\left. {\left( {{x_m},{\mathit{\boldsymbol{Y}}_m}} \right)} \right\}\left( {{x_i} \in \mathit{\boldsymbol{X}},{\mathit{\boldsymbol{Y}}_i} \subseteq \mathit{\boldsymbol{L}}} \right)} \end{array} $ | (9) |
汉明损失(简写为H-Loss):评价实例标签对误分类次数,分值从0到1,0表示最好结果,其计算公式为
$ hloss\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {h\left( {{x_i}} \right)\Delta {\mathit{\boldsymbol{Y}}_i}} \right|}}{{\left| \mathit{\boldsymbol{L}} \right|}}} $ | (10) |
式中,Δ表示对称差,数学上,两个集合的对称差是只属于其中一个集合,而不属于另一个集合的元素组成的集合。
宏平均
$ p\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {h\left( {{x_i}} \right)} \right|}}} $ | (11) |
$ r\left( h \right) = \frac{1}{m}\sum\limits_{i = 1}^m {\frac{{\left| {{\mathit{\boldsymbol{Y}}_i} \cap h\left( {{x_i}} \right)} \right|}}{{\left| {{\mathit{\boldsymbol{Y}}_i}} \right|}}} $ | (12) |
$ F1\left( h \right) = 2 \times \frac{{p\left( h \right) \times r\left( h \right)}}{{p\left( h \right) + r\left( h \right)}} $ | (13) |
$ F{1_{Macro}} = \frac{1}{q}\sum\limits_{i = 1}^q {F{1_i}} ,{y_i} \in \mathit{\boldsymbol{L}} $ | (14) |
3.5 实验结果和讨论
3.5.1 多标签分类算法性能比较
选择ImageCLEF2016多标签分类任务两组最佳结果,作为对比实验方法,它们均为基于图像内容的多标签分类模型,在ImageNet上预训练AlexNet,迁移学习到当前数据集,抽取DeCAF特征训练SVM分类器,根据SVM输出的最大评分或最大后验概率,标签标定为唯一相关标签。如表 2所示,基准算法获得不错的性能,汉明损失最低至0.013 1,宏平均
表 2
ImageCLEF2016多标签分类算法性能比较
Table 2
Results of multi-label classification methods in ImageCLEF2016
方法 | 10FCV | 测试 | |||
H-Loss | H-Loss | ||||
BMET MLC1[11] | - | - | 0.013 1 | 0.295 | |
BMET MLC2[11] | - | - | 0.013 5 | 0.320 | |
Hetero_TL_V | 0.028 1 | 0.171 | 0.024 2 | 0.237 | |
Hybrid_TL_V | 0.022 4 | 0.316 | 0.016 0 | 0.482 | |
No_TL_T | 0.036 5 | 0.082 | 0.036 4 | 0.024 | |
Homo_TL_T | 0.032 9 | 0.117 | 0.023 9 | 0.185 | |
Hybrid_TL_Cross-Modal | 0.022 4 | 0.333 | 0.015 7 | 0.488 |
本文基于混杂迁移学习的跨模态分类算法(Hybrid_TL_Cross-Modality)与本领域最先进算法相比,具有相近的汉明损失值,能够较准确地标定标签信息,汉明损失值低至0.015 7。而宏平均
3.5.2 迁移学习性能比较
从视觉内容中抽取标签信息的过程中,本文在传统异质迁移学习的基础上,引入同质迁移学习,采取混杂迁移学习方法(Hybrid_TL_V)。从表 2可以看出,Hybrid_TL_V的汉明损失和宏平均
从文本信息中识别标签的过程中,本文采用同质迁移学习的方法(Homo_TL_T),并与不采用迁移学习的方法(No_TL_T)进行比较。从表 2可以看出,Homo_TL_T获得汉明损失和宏平均
3.5.3 跨模态标签标定
为了进一步提高多标签分类的性能,本文融合两种最优单模态算法Hybrid_TL_V和Homo_TL_T,采用基于迁移学习的跨模态多标签分类算法,即Hybrid_TL_Cross-modal。跨模态算法具有更低的汉明损失0.015 7和更高的
表 3
阈值标定算法性能比较
Table 3
Comparison of threshold calibration methods
方法 | 10FCV | 测试 | |||
H-Loss | H-Loss | ||||
Minimizing_LCard | 0.026 7 | 0.348 | 0.020 6 | 0.477 | |
Threshold_0.5 | 0.022 6 | 0.326 | 0.016 1 | 0.470 | |
Highest_Probability | 0.022 6 | 0.287 | 0.015 0 | 0.438 | |
TH_0.5 | 0.022 4 | 0.333 | 0.015 7 | 0.488 |
首先采用Tahir等人[41]建议的融合方法进行实验,即图像和文本模型进行结果融合时,根据训练集每个标签的两种模型预测结果统计指标(例如宏平均或微平均
固定阈值法选择固定阈值0.5标定标签,若依照两个模态模型输出预测概率的均值标定标签,这种标定方法获得具有一定潜力的多标签分类性能,即汉明损失0.016 1和宏平均
本文也采用Read等人[22]建议的最小化预测标签集和训练集标签集差异(最小化标签基数差)的方法,动态确定阈值。然而,最小化基数法(见表 3方法Minimizing_LCard)获得动态阈值为0.296,按照本文采用的策略标定标签,汉明损失和宏平均
4 结论
本文搭建的图像分类模型,利用异质迁移学习的方法,从自然图像学习通用领域的一般特征,再利用同质迁移学习的方法,从单标签的生物医学简单图像中,学习更多生物医学领域的专有特征,进而提高标签分类的准确性、精度和召回率。而本文搭建的文本分类模型,由于引入同质迁移学习机制,标签分类性能提升也十分明显。本文提出的生物医学图像多标签分类方法,从图像内容和相关说明文本中,都可以捕获模式特征,融合两种模态的多标签分类模型后,与现有方法相比,能够更加有效地标定生物医学模式标签。实验结果表明,本文方法适用于从生物医学文献的复合图像中提取子图模式信息的任务,能够更好地缓解过拟合占多数类的问题。本文方法提高多标签分类的召回率后,准确性较以往方法有小幅下降。在未来的工作中,研究训练数据中蕴含的潜在标签关系,利用标签共现信息,有望进一步提高标签标定的准确性。
参考文献
-
[1] Lu Z Y. PubMed and beyond:a survey of web tools for searching biomedical literature[J]. Database, 2011, 2011: baq036. [DOI:10.1093/database/baq036]
-
[2] De Herrera A G S, Kalpathy-Cramer J, Fushman D D, et al. Overview of the ImageCLEF 2013 medical tasks[C]//Working Notes of CLEF 2013 Conference. Valencia, Spain: CEUR-WS, 2013: 1-15.
-
[3] De Herrera A G S, Schaer R, Bromuri S, et al. Overview of the ImageCLEF 2016 medical task[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 219-232.
-
[4] Santosh K C, Xue Z Y, Antani S K, et al. NLM at ImageCLEF2015: biomedical multipanel figure separation[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-8.
-
[5] Li P Y, Sorensen S, Kolagunda A, et al. UDEL CIS working notes in ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 334-346.
-
[6] Santosh K C, Aafaque A, Antani S, et al. Line segment-based stitched multipanel figure separation for effective biomedical CBIR[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(6): 1757003. [DOI:10.1142/S0218001417570038]
-
[7] Pelka O, Friedrich C M. FHDO biomedical computer science group at medical classification task of ImageCLEF 2015[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-14.
-
[8] Koitka S, Friedrich C M. Traditional feature engineering and deep learning approaches at medical classification task of ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 304-317.
-
[9] Kumar A, Kim J, Lyndon D, et al. An ensemble of fine-tuned convolutional neural networks for medical image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2017, 21(1): 31–40. [DOI:10.1109/JBHI.2016.2635663]
-
[10] De Herrera A G S, Müller H, Bromuri S. Overview of the ImageCLEF 2015 medical classification task[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-13.
-
[11] Kumar A, Lyndon D, Kim J, et al. Subfigure and multi-Label classification using a fine-tuned convolutional neural network[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 318-321.
-
[12] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. [DOI:10.1007/s11263-015-0816-y]
-
[13] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2012: 1097-1105. [DOI:10.1145/3065386]
-
[14] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 2015 International Conference for Learning Representations. San Diego, USA: ICLR, 2015: 1-14.
-
[15] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA: IEEE, 2015: 1-9. [DOI:10.1109/cvpr.2015.7298594]
-
[16] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778. [DOI:10.1109/cvpr.2016.90]
-
[17] Schapire R E, Singer Y. BoosTexter:a boosting-based system for text categorization[J]. Machine Learning, 2000, 39(2-3): 135–168. [DOI:10.1023/A:1007649029923]
-
[18] Zhang M L, Zhou Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837. [DOI:10.1109/TKDE.2013.39]
-
[19] Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2285-2294. [DOI:10.1109/cvpr.2016.251]
-
[20] Yang H, ZhouJ T, Zhang Y, et al. Exploit bounding box annotations for multi-label object recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 280-288. [DOI:10.1109/cvpr.2016.37]
-
[21] Yu Q H, Wang J J, Zhang S Z, et al. Combining local and global hypotheses in deep neural network for multi-label image classification[J]. Neurocomputing, 2017, 235: 38–45. [DOI:10.1016/j.neucom.2016.12.051]
-
[22] Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333–359. [DOI:10.1007/s10994-011-5256-5]
-
[23] Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7): 1079–1089. [DOI:10.1109/TKDE.2010.164]
-
[24] Boutell M R, Luo J B, Shen X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757–1771. [DOI:10.1016/j.patcog.2004.03.009]
-
[25] Zhang M L, Zhou Z H. ML-KNN:a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038–2048. [DOI:10.1016/j.patcog.2006.12.019]
-
[26] Elisseeff A, Weston J. A kernel method for multi-labelled classification[C]//Preceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, British Columbia, Canada: ACM, 2001: 681-687.
-
[27] Clare A, King R D. Knowledge discovery in multi-label phenotype data[C]//Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery. Freiburg, Germany: Springer, 2001: 42-53. [DOI:10.1007/3-540-44794-6_4]
-
[28] Quevedo J R, Luaces O, Bahamonde A. Multilabel classifiers with a probabilistic thresholding strategy[J]. Pattern Recognition, 2012, 45(2): 876–883. [DOI:10.1016/j.patcog.2011.08.007]
-
[29] Gao M C, Xu Z Y, Lu L, et al. Holistic interstitial lung disease detection using deep convolutional neural networks: multi-label learning and unordered pooling[J]. arXiv preprint arXiv: 1701. 05616, 2017.
-
[30] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. [DOI:10.1007/s11263-009-0275-4]
-
[31] You D, Rahman M M, Antani S, et al. Text-and content-based biomedical image modality classification[C]//Proceedings of the SPIE Volume 8674, Medical Imaging 2013: Advanced PACS-based Imaging Informatics and Therapeutic Applications. Lake Buena Vista, FL, USA: SPIE, 2013, 8674: 86740L. [DOI:10.1117/12.2007932]
-
[32] Rahman M M, You D, Simpson M S, et al. Multimodal biomedical image retrieval using hierarchical classification and modality fusion[J]. International Journal of Multimedia Information Retrieval, 2013, 2(3): 159–173. [DOI:10.1007/s13735-013-0038-4]
-
[33] Codella N, Connell J, Pankanti S, et al. Automated medical image modality recognition by fusion of visual and text information[C]//Proceddings of the 17th International Conference on Medical Image Computing and Computer-Assisted Intervention. Boston, MA, USA: Springer, 2014: 487-495. [DOI:10.1007/978-3-319-10470-6_61]
-
[34] Yu Y H, Lin H F, Yu Q H, et al. Modality classification for medical images using multiple deep convolutional neural networks[J]. Journal of Computational Information Systems, 2015, 11(15): 5403–5413. [DOI:10.12733/jcis14859]
-
[35] Cheng B B, Stanley R J, Antani S, et al. Graphical figure classification using data fusion for integrating text and image features[C]//Preceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE. 2013: 693-697. [DOI:10.1109/icdar.2013.142]
-
[36] Bromuri S, Zufferey D, Hennebert J, et al. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms[J]. Journal of Biomedical Informatics, 2014, 51: 165–175. [DOI:10.1016/j.jbi.2014.05.010]
-
[37] Yu Y H, Lin H F, Meng J N, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2): 41. [DOI:10.3390/a9020041]
-
[38] Chen D, Riddle D L. Function of the PHA-4/FOXA transcription factor during
$C. elegans$ post-embryonic development[J]. BMC Developmental Biology, 2008, 8: 26. [DOI:10.1186/1471-213X-8-26] -
[39] Yu Y H, Lin H F, Meng J N, et al. Assembling deep neural networks for medical compound figure detection[J]. Information, 2017, 8(2): 48. [DOI:10.3390/info8020048]
-
[40] Yu Y H, Lin H F, Meng J N, et al. Deep transfer learning for modality classification of medical images[J]. Information, 2017, 8(3): 91. [DOI:10.3390/info8030091]
-
[41] Tahir M A, Kittler J, Bouridane A. Multilabel classification using heterogeneous ensemble of multi-label classifiers[J]. Pattern Recognition Letters, 2012, 33(5): 513–523. [DOI:10.1016/j.patrec.2011.10.019]