染色体核型分析深度学习方法综述
罗纯龙,赵屹(中国科学院计算技术研究所;中国科学院大学) 摘 要
染色体核型分析是细胞遗传学领域重要的实验技术,并逐步在包括生殖医学在内的诸多现代临床领域和科学研究方面得到广泛应用,但即使是经验丰富的细胞遗传学家也需要大量时间才能完成染色体核型分析。基于传统方法的染色体核型自动化分析方法精度较低,仍需要细胞遗传学家花费大量时间精力校正。目前基于深度学习的染色体核型自动分析方法成果较多,但缺乏对该领域现状的总结、对未来发展的展望等。因此,本文对基于深度学习的染色体核型自动分析方法进行综述,归纳总结了现有的研究分析任务,挑选了具有代表性的方法并梳理解决方案,展望了未来发展方向。通过整理发现,基于深度学习的染色体核型自动化分析方法取得了很多成果,但仍存在一些问题。已有的中文综述性工作仅集中于某一子领域或者调研不够全面和深入。其次,染色体核型分析任务与临床紧密结合,受各种因素制约,任务类型繁多,解决方案复杂,难以窥斑见豹。最后,现有方法主要集中于染色体分类和染色体分割任务,而诸如染色体计数、染色体预处理等任务研究成果较少,需要厘清问题吸引更多研究人员关注。综上所述,基于深度学习的染色体核型自动分析方法仍有较大发展空间。
关键词
Review of Deep Learning Methods for Karyotype Analysis
Luo Chunlong,Zhao Yi(Institute of Computing Technology,Chinese Academy of Sciences) Abstract
Chromosomal abnormalities can lead to serious diseases such as chronic myeloid leukemia and Down syndrome. Karyotyping can count chromosomes in metaphase images, segment them from the background, arrange them according to rules, and finally observe and issue diagnostic results. Therefore, karyotype analysis has gradually been widely used in many modern clinical fields and scientific research. But even an experienced cytogeneticist takes a lot of time to employ karyotyping. Though machine learning or traditional geometric methods have tried to automate karyotype analysis, most of them have poor performance and do not satisfy clinical requirements, which means that cytogeneticists still require a lot of time for manual intervention. Currently, many deep learning-based methods have been proposed, but there is a lack of systematic reviews. Our research reviews the recent literature and summarizes them into chromosome counting, chromosome segmentation, chromosome cluster classification, chromosome preprocessing, chromosome classification, and chromosome anomaly. First, it summarizes chromosome counting methods based on bounding box detection. The key point of it is to find out and identify each chromosome on the metaphase images accurately. Specifically, they need to find candidate object proposals, then classify them into different classes and refine locations. However, they must solve self-similarity problems, over-deletion problems, and inaccurate localization problems resulting from overlapping chromosomes. Meanwhile, researchers also pay attention to accelerating model inference speed through lightweight backbones. For the chromosome segmentation task, methods can be divided into two categories: semantic segmentation methods and instance segmentations. Semantic segmentation methods can only solve the problem of segmenting chromosome clusters formed by overlapping two or more chromosomes, and some post-processing should be introduced to splice chromosomes. Instance segmentation methods can automate chromosome segmentation, and additional supervision information such as key points or orientation information can further improve performance. Considering that some chromosome segmentation methods only can solve a specific type of chromosome cluster, it is necessary to identify the type of cluster. Existing methods roughly classify chromosome clusters according to different criteria, one based on the number of overlapping chromosomes and the other based on the interrelationship between touching and overlapping chromosomes. However, from a methodology perspective, current works are mostly based on simple convolution neural networks. Therefore, the chromosome cluster classification task needs more innovative studies. As for the chromosome preprocessing task, existing methods mainly address two preprocessing tasks: metaphase image denoising and chromosome straightening. The metaphase image denoising task is solved in a segmentation manner, where the chromosomes are regarded as a whole area and need to be segmented from the background and impurities present in the image. The existing chromosome straightening methods rely on generative adversarial networks to straighten curved chromosomes. They generally follow the image translation framework or motion transformation framework. Next, benefiting from the booming development of deep learning-based image classification networks, chromosome classification task has also received the most attention and development in karyotype analysis-related tasks. According to the properties of methods, the approaches available can be divided into 1) simple CNN-based methods, which means redesigning the network aiming at chromosome instances instead of directly using the famous CNN model proposed for the ImageNet dataset; 2) feature contrastive-based methods, which extract representative features using the contrastive manner and then classify them through simple classifier; 3) image preprocessing based methods, where before classification, they firstly apply super-resolution methods to unify size of chromosome images or enhance banding pattern features using different filters; 4) global and local feature fusion based methods, which explicitly crop and extract features of local but important image part, and then fusion them for final classification; 5) complex strategy based methods, where some of them solve chromosome classification task by detecting chromosomes from metaphase images and others improve performance using ensemble learning framework. The final reviewed task is chromosome anomaly including detection subtask and generation subtask. Though highly concerned by clinical experts, the existing studies only can detect a specific type of chromosome anomaly through basic CNN or roughly discriminate between normal and abnormal chromosomes by generative adversarial network framework. As for the generation subtasks, the approaches available are also based on generative adversarial networks. At the end of the paper, the various tasks and main methodologies are summarized and commented on, and then feasible future developments are also proposed. Firstly, to solve these tasks, multiple advanced solution paradigms can be introduced, such as multimodality and image question answering. The second problem is that chromosomal abnormality diagnosis has not been addressed because it involves the extraction of band-level features and relational reasoning. Finally, pretraining models in a self-supervised learning manner are worth noticing by researchers. Although lacking high-quality labeled data for chromosomes, a large amount of clinically unlabeled data can still reduce the cost of data labeling and improve the performance of downstream tasks through the self-supervised learning paradigm. In summary, it is necessary to review deep learning-based automatic karyotyping methods, which can draw more researchers" attention to this field.
Keywords
|