跨模态多标签生物医学图像分类建模识别
Classification modeling and recognition for cross modal and multi-label biomedical image
- 2018年23卷第6期 页码:917-927
收稿:2017-10-26,
修回:2017-12-5,
纸质出版:2018-06-16
DOI: 10.11834/jig.170556
移动端阅览

浏览全部资源
扫码关注微信
收稿:2017-10-26,
修回:2017-12-5,
纸质出版:2018-06-16
移动端阅览
目的
2
生物医学文献中的图像经常是包含多种模式的复合图像,自动标注其类别,将有助于提高图像检索的性能,辅助医学研究或教学。
方法
2
融合图像内容和说明文本两种模态的信息,分别搭建基于深度卷积神经网络的多标签分类模型。视觉分类模型借用自然图像和单标签的生物医学简单图像,实现异质迁移学习和同质迁移学习,捕获通用领域的一般特征和生物医学领域的专有特征,而文本分类模型利用生物医学简单图像的说明文本,实现同质迁移学习。然后,采用分段式融合策略,结合两种模态模型输出的结果,识别多标签医学图像的相关模式。
结果
2
本文提出的跨模态多标签分类算法,在ImageCLEF2016生物医学图像多标签分类任务数据集上展开实验。基于图像内容的混合迁移学习方法,比仅采用异质迁移学习的方法,具有更低的汉明损失和更高的宏平均F1值。文本分类模型引入同质迁移学习后,能够明显提高标签的分类性能。最后,融合两种模态的多标签分类模型,获得与评测任务最佳成绩相近的汉明损失,而宏平均F1值从0.320上升到0.488,提高了约52.5%。
结论
2
实验结果表明,跨模态生物医学图像多标签分类算法,融合图像内容和说明文本,引入同质和异质数据进行迁移学习,缓解生物医学图像领域标注数据规模小且标签分布不均衡的问题,能够更有效地识别复合医学图像中的模式信息,进而提高图像检索性能。
Objective
2
The amount of biomedical literature in electronic format has increased considerably with the development of the Internet. PubMed comprises more than 27 million citations for biomedical literature linking to full-text content from PubMed Central and publisher web sites. The figures in these biomedical studies can be retrieved through tools along with the full text. However
the lack of associated metadata
apart from the captions
hinders the fulfillment of richer information requirements of biomedical researchers and educators. The modality of a figure is an extremely useful type of metadata. Therefore
biomedical modality classification is an important primary step that can aid users to access required biomedical images and further improve the performance of the literature retrieval system. Many images in the biomedical literature (more than 40%) are compound figures including several subfigures with various biomedical modalities
such as computerized tomography
X-ray
or generic biomedical illustrations. The subfigures in one compound figure may describe one medical problem in several views and have strong semantic correlation with each other. Thus
these figures are valuable to biomedical research and education. The standard approach to modality recognition from biomedical compound figure first detects whether the figure is compound or not. If it is compound
then a figure separation algorithm is first invoked to split it into its constituent subfigures. Then
another multi-class classifier is used to predict the modality of each subfigure. Nevertheless
the figure separation algorithms are not perfect
and the errors in figure separation propagate to the multi-class model for modality classification. Recently
some multi-label learning models use pre-trained convolutional neural networks to extract high-level features to recognize the image modalities from the compound figures. These deep learning methods learn more expressive representations of image data. However
convolutional neural networks may be hindered to disentangle the factors of variation by the limited samples with high variability and the imbalanced label distribution of training data. A new cross-modal multi-label classification model using convolutional neural networks based on hybrid transfer learning is presented to learn biomedical modality information from the compound figure without separating it into subfigures.
Method
2
An end-to-end training and multi-label classification method
which does not require additional classifiers
is proposed. Building two convolutional neural networks enables to learn the components of an image without learning from single separated subfigure that represents the image modalities
but from labeled compound figures and their captions. The proposed cross-modal model learns general domain features from large-scale nature images and more special biomedical domain features from the simple figures and their captions in biomedical literature
leveraging techniques of heterogeneous and homogeneous transfer learning. Specifically
the proposed visual convolutional neural network (CNN) is pre-trained on a large auxiliary dataset
which contains approximately 1.2 million labeled training images of 1000 classes. Then
the top layer of the deep CNN is trained from scratch on single-label simple biomedical figures to achieve homogeneous transfer learning. The key point of such transfer learning is fine-tuning the pre-trained deep visual models on the current multi-label compound figure dataset. The architecture of the deep visual models should be changed slightly and then they could be fine-tuned on the current dataset. On the other hand
the weights of the embedding layer are initialized by the word vectors
which are pre-trained on captions extracted from 300 000 biomedical articles in PubMed
and are updated while training the networks. Similar to the homogeneous transfer learning strategy of visual model
the proposed textual convolutional neural networks are first pre-trained on the captions of the simple biomedical figures. Then
the pre-trained textual model is fine-tuned on current multi-label compound figures to capture more biomedical features. Finally
cross-modal multi-label learning model combines outputs of the visual and textual models to predict labels using multi-stage fusion strategy.
Result
2
The proposed cross-modal multi-label classification model based on hybrid transfer learning is evaluated on the dataset of the multi-label classification task in ImageCLEF2016. Our approach is evaluated based on multi-label classification Hamming Loss and Macro F1 Score
according to the evaluation criterion of the benchmark. The two comparative models learn multi-label information only from visual content. They pre-train AlexNet on large-scale nature images. Then
the DeCAF features are extracted from the pre-trained AlexNet and fed into the SVM classifier with a linear kernel. One comparative model predicts modalities by the highest score of SVM and the other model predicts by the highest posterior probability. The visual model achieves 33.9% lower Hamming Loss and 100.3% higher Macro F1 Score by introducing homogeneous transfer learning technique
and the textual model efficiently improves the performance in the two metrics. Thus
the proposed cross-modal model can achieve similar Hamming Loss of 0.0157 with the state-of-the-art model and obtain 52.5% higher Macro F1 Score
which is increased from 0.320 to 0.488.
Conclusion
2
A new method to extract biomedical modalities from the compound figures is proposed. The proposed models obtain more competitive results than the other reported methods in the literature. The proposed cross-modal model exhibits acceptable generalization capability and could achieve higher performance. The results imply that the homogeneous transfer learning method can aid deep convolutional neural networks (DCNNs) to capture a larger number of biomedical domain features and improve the performance of multi-label classification. The proposed cross-modal model addresses the problems of overfitting and imbalanced dataset and effectively recognizes modalities from biomedical compound figures based on visual content and textual information. In the future
building DCNNs and training networks with new techniques could further improve the proposed method.
Lu Z Y. PubMed and beyond:a survey of web tools for searching biomedical literature[J]. Database, 2011, 2011:baq036.[DOI:10.1093/database/baq036]
De Herrera A G S, Kalpathy-Cramer J, Fushman D D, et al. Overview of the ImageCLEF 2013 medical tasks[C]//Working Notes of CLEF 2013 Conference. Valencia, Spain: CEUR-WS, 2013: 1-15.
De Herrera A G S, Schaer R, Bromuri S, et al. Overview of the ImageCLEF 2016 medical task[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 219-232.
Santosh K C, Xue Z Y, Antani S K, et al. NLM at ImageCLEF2015: biomedical multipanel figure separation[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-8.
Li P Y, Sorensen S, Kolagunda A, et al. UDEL CIS working notes in ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 334-346.
Santosh K C, Aafaque A, Antani S, et al. Line segment-based stitched multipanel figure separation for effective biomedical CBIR[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(6):1757003.[DOI:10.1142/S0218001417570038]
Pelka O, Friedrich C M. FHDO biomedical computer science group at medical classification task of ImageCLEF 2015[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-14.
Koitka S, Friedrich C M. Traditional feature engineering and deep learning approaches at medical classification task of ImageCLEF 2016[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 304-317.
Kumar A, Kim J, Lyndon D, et al. An ensemble of fine-tuned convolutional neural networks for medical image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2017, 21(1):31-40.[DOI:10.1109/JBHI.2016.2635663]
De Herrera A G S, Müller H, Bromuri S. Overview of the ImageCLEF 2015 medical classification task[C]//Working Notes of CLEF 2015 Conference. Toulouse, France: CEUR-WS, 2015: 1-13.
Kumar A, Lyndon D, Kim J, et al. Subfigure and multi-Label classification using a fine-tuned convolutional neural network[C]//Working Notes of CLEF 2016 Conference. Évora, Portugal: CEUR-WS, 2016: 318-321.
Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.[DOI:10.1007/s11263-015-0816-y]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2012: 1097-1105. [ DOI:10.1145/3065386 http://dx.doi.org/10.1145/3065386 ]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 2015 International Conference for Learning Representations. San Diego, USA: ICLR, 2015: 1-14.
Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA: IEEE, 2015: 1-9. [ DOI:10.1109/cvpr.2015.7298594 http://dx.doi.org/10.1109/cvpr.2015.7298594 ]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778. [ DOI:10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90 ]
Schapire R E, Singer Y. BoosTexter:a boosting-based system for text categorization[J]. Machine Learning, 2000, 39(2-3):135-168.[DOI:10.1023/A:1007649029923]
Zhang M L, Zhou Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837.[DOI:10.1109/TKDE.2013.39]
Wang J, Yang Y, Mao J H, et al. CNN-RNN: a unified framework for multi-label image classification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 2285-2294. [ DOI:10.1109/cvpr.2016.251 http://dx.doi.org/10.1109/cvpr.2016.251 ]
Yang H, ZhouJ T, Zhang Y, et al. Exploit bounding box annotations for multi-label object recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 280-288. [ DOI:10.1109/cvpr.2016.37 http://dx.doi.org/10.1109/cvpr.2016.37 ]
Yu Q H, Wang J J, Zhang S Z, et al. Combining local and global hypotheses in deep neural network for multi-label image classification[J]. Neurocomputing, 2017, 235:38-45.[DOI:10.1016/j.neucom.2016.12.051]
Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3):333-359.[DOI:10.1007/s10994-011-5256-5]
Tsoumakas G, Katakis I, Vlahavas I. Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7):1079-1089.[DOI:10.1109/TKDE.2010.164]
Boutell M R, Luo J B, Shen X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9):1757-1771.[DOI:10.1016/j.patcog.2004.03.009]
Zhang M L, Zhou Z H. ML-KNN:a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7):2038-2048.[DOI:10.1016/j.patcog.2006.12.019]
Elisseeff A, Weston J. A kernel method for multi-labelled classification[C]//Preceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, British Columbia, Canada: ACM, 2001: 681-687.
Clare A, King R D. Knowledge discovery in multi-label phenotype data[C]//Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery. Freiburg, Germany: Springer, 2001: 42-53. [ DOI:10.1007/3-540-44794-6_4 http://dx.doi.org/10.1007/3-540-44794-6_4 ]
Quevedo J R, Luaces O, Bahamonde A. Multilabel classifiers with a probabilistic thresholding strategy[J]. Pattern Recognition, 2012, 45(2):876-883.[DOI:10.1016/j.patcog.2011.08.007]
Gao M C, Xu Z Y, Lu L, et al. Holistic interstitial lung disease detection using deep convolutional neural networks: multi-label learning and unordered pooling[J]. arXiv preprint arXiv: 1701. 05616, 2017.
Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.[DOI:10.1007/s11263-009-0275-4]
You D, Rahman M M, Antani S, et al. Text-and content-based biomedical image modality classification[C]//Proceedings of the SPIE Volume 8674, Medical Imaging 2013: Advanced PACS-based Imaging Informatics and Therapeutic Applications. Lake Buena Vista, FL, USA: SPIE, 2013, 8674: 86740L. [ DOI:10.1117/12.2007932 http://dx.doi.org/10.1117/12.2007932 ]
Rahman M M, You D, Simpson M S, et al. Multimodal biomedical image retrieval using hierarchical classification and modality fusion[J]. International Journal of Multimedia Information Retrieval, 2013, 2(3):159-173.[DOI:10.1007/s13735-013-0038-4]
Codella N, Connell J, Pankanti S, et al. Automated medical image modality recognition by fusion of visual and text information[C]//Proceddings of the 17th International Conference on Medical Image Computing and Computer-Assisted Intervention. Boston, MA, USA: Springer, 2014: 487-495. [ DOI:10.1007/978-3-319-10470-6_61 http://dx.doi.org/10.1007/978-3-319-10470-6_61 ]
Yu Y H, Lin H F, Yu Q H, et al. Modality classification for medical images using multiple deep convolutional neural networks[J]. Journal of Computational Information Systems, 2015, 11(15):5403-5413.[DOI:10.12733/jcis14859]
Cheng B B, Stanley R J, Antani S, et al. Graphical figure classification using data fusion for integrating text and image features[C]//Preceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE. 2013: 693-697. [ DOI:10.1109/icdar.2013.142 http://dx.doi.org/10.1109/icdar.2013.142 ]
Bromuri S, Zufferey D, Hennebert J, et al. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms[J]. Journal of Biomedical Informatics, 2014, 51:165-175.[DOI:10.1016/j.jbi.2014.05.010]
Yu Y H, Lin H F, Meng J N, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks[J]. Algorithms, 2016, 9(2):41.[DOI:10.3390/a9020041]
Chen D, Riddle D L. Function of the PHA-4/FOXA transcription factor during $$C. elegans$$ post-embryonic development[J]. BMC Developmental Biology, 2008, 8:26.[DOI:10.1186/1471-213X-8-26]
Yu Y H, Lin H F, Meng J N, et al. Assembling deep neural networks for medical compound figure detection[J]. Information, 2017, 8(2):48.[DOI:10.3390/info8020048]
Yu Y H, Lin H F, Meng J N, et al. Deep transfer learning for modality classification of medical images[J]. Information, 2017, 8(3):91.[DOI:10.3390/info8030091]
Tahir M A, Kittler J, Bouridane A. Multilabel classification using heterogeneous ensemble of multi-label classifiers[J]. Pattern Recognition Letters, 2012, 33(5):513-523.[DOI:10.1016/j.patrec.2011.10.019]
相关作者
相关机构
京公网安备11010802024621