Current Issue Cover
深度学习在组织病理学中的应用综述

金旭, 文可, 吕国锋, 石军, 迟孟贤, 武铮, 安虹(中国科学技术大学计算机科学与技术学院, 合肥 230026)

摘 要
组织病理学是临床上肿瘤诊断的金标准,直接关系到治疗的开展与预后的评估。来自临床的需求为组织病理诊断提出了质量与效率两个方面的挑战。组织病理诊断涉及大量繁重的病理切片判读任务,高度依赖医生的经验,但病理医生的培养周期长,人才储备缺口巨大,病理科室普遍超负荷工作。近年来出现的基于深度学习的组织病理辅助诊断方法可以帮助医生提高诊断工作的精度与速度,缓解病理诊断资源不足的问题,引起了研究人员的广泛关注。本文初步综述深度学习方法在组织病理学中的相关研究工作。介绍了组织病理诊断的医学背景,整理了组织病理学领域的主要数据集,重点介绍倍受关注的乳腺癌、淋巴结转移癌、结肠癌的病理数据及其分析任务。本文归纳了数据的存储与处理、模型的设计与优化以及小样本与弱标注学习这3项需要解决的技术问题。围绕这些问题,本文介绍了包括数据存储、数据预处理、分类模型、分割模型、迁移学习和多示例学习等相关研究工作。最后总结了面向组织病理学诊断的深度学习方法研究现状,并指出当下研究工作可能的改进方向。
关键词
Survey on the applications of deep learning to histopathology

Jin Xu, Wen Ke, Lyu Guofeng, Shi Jun, Chi Mengxian, Wu Zheng, An Hong(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China)

Abstract
Histopathology is the gold standard for the clinical diagnosis of tumors and directly related to clinical treatment and prognosis. Its application inclinics has presented challenges in terms of the accuracy and efficiency of histopathological diagnosis, and pathological diagnosis is time consuming and requires pathologists to examine slides with a microscope for them to make reliable decisions. Moreover, the training period of a pathologist is long. In many parts of China, pathology departments are generally overworked due to the insufficient number of pathologists. Recently, deep learning has achieved great success in computer vision. The utilization of whole slide scanners enables the application of deep learning-based classification and segmentation methods to histopathological diagnosis, thereby improving efficiency and accuracy. In this paper, we first introduce the medical background of histopathology diagnosis. Then, we provide an overview of the primary datasets of histopathological diagnosis. We focus on introducing the datasets of three types of malignant tumors along with corresponding computer vision tasks. Breast cancer forms in the cells of breasts and is one of the most common cancer diagnosed in women. Early diagnosis can significantly improve survival rate and quality of life. Sentinel lymph node metastasis are visible when a cancer spreads. The diagnosis of lymph node metastasis is directly related to cancer staging and surgical plan decision. Colon cancer can be detected by colonoscopy biopsy, and early diagnosis requires pathologists to examine slides thoroughly for small malignancies.Computer-aided diagnosis can increase the efficiency of pathologists. Moreover, we propose three key technical problems: data storage and processing, model design and improvement, and learning with small amount or weakly labeled data. Then we review research progress related to tasks, including data preprocessing, classification, and segmentation and transfer learning and multiple instance learning. Pathology datasets are usually stored in a pyramidal tiled image format for fast loading and rescaling. The OpenSlide library provides high-performance pathology data reading, and the open-source softwareautomated slide analysis platform (ASAP) can be used in viewing and labeling these data. Trimming white backgrounds can reduce storage and calculation overhead to 82% on mainstream datasets. A stain normalization technology can eliminate color difference caused by slide production and scanning process. The classification of pathological image patches is the basic structure of whole slide classification and the backbone network of segmentation. Mainstream convolutional neural network models in the field of computer vision, including AlexNet, visual geometry group(VGG), GoogLeNet, and residual neural networks can reach satisfying results for pathological image patches. The patch sampling method can divide a whole slide image into smaller patches that can be processed by the mainstream convolutional neural network models. By aggregating the features of sampled patches through random forest or voting, a patch sampling method can be used in classifying or segmenting arbitrarily sized images. A migration learning technology based on neural network models pretrained on an ImageNet dataset is effective in alleviating the problem introduced by the small number of training samples in histopathological data. Fully convolutional network (FCN) represented by U-Net is a network designed for medical image segmentation tasks and are faster than convolutional neural networks with patch sampling methods. To utilize weakly labeled data, multiple instance learning (MIL) treats whole slide image as a bag of unlabeled pathological image patches. With bags labeled, MIL can be used for weakly supervised learning. Finally, this paper summarizes the main works surveyed and identifies challenges for future research. To make deep learning-based computer aided diagnosis clinically practical, researchers have to improve model accuracy, expand clinical application scenarios, and improve the interpretability of results.
Keywords

订阅号|日报