Current Issue Cover
深度学习下主流染色体分类算法的性能评估

易序晟1,2,3, 尹爱华4, 黄杰晟1,2,3, 彭璟1,2,3, 陈汉彪4, 郭莉4, 林成创1,2,3, 李双印1,2,3, 赵淦森1,2,3(1.华南师范大学计算机学院, 广州 510631;2.广州市云计算安全与测评技术重点实验室, 广州 510631;3.华南师范大学唯链区块链技术与应用联合实验室, 广州 510631;4.广东省妇幼保健院, 广州 511400)

摘 要
目的 染色体分类是医学影像处理的具体任务之一,最终结果可为医生提供重要的临床诊断信息,在产前诊断中起着重要作用。深度学习由于强大的特征表达能力在医学影像领域得到了广泛应用,但是基于深度学习的大部分染色体分类算法都是在轻量化私有数据库上得到的不同水准的分类结果,难以客观评估不同算法间的优劣,导致缺乏对算法的临床筛选标准,因此迫切需要在大规模数据库上对不同算法开展基于同样数据级的性能评估,以获取具有客观可对比性的性能数据,这对于科研成果的转化具有重要意义。方法 本文基于广东省妇幼保健院提供的染色体数据,构建了包含126 453条染色体的临床数据库,精选6个主流染色体分类模型在该数据库上展开对比实验与性能评估。结果 在本文构建的大规模染色体临床数据库上,实验和分析发现,参评模型分类准确率均达到92%以上,其中MixNet模型提升后分类效果最好,为98.92%。即使分类性能落后的模型在本数据集上训练也得到明显改善,准确率从86.7%提升至92.09%,相比早期报告的性能提升了5.39%。结论 开展实证研究实验发现,数据库规模大小是影响染色体分类算法能否取得理想分类效果的重要因素之一。对于染色体分类任务而言,残差神经网络是比较合适的网络结构,但结果方面缺乏可解释性等原因,导致与高精度临床应用要求还存在差距。基于深度学习技术的染色体分类研究还需要进一步深入开展。
关键词
Performance evaluation of mainstream chromosome recognition algorithms under deep learning

Yi Xusheng1,2,3, Yin Aihua4, Huang Jiesheng1,2,3, Peng Jing1,2,3, Chen Hanbiao4, Guo Li4, Lin Chengchuang1,2,3, Li Shuangyin1,2,3, Zhao Gansen1,2,3(1.School of Computer Science and Technology, South China Normal University, Guangzhou 510631, China;2.Key Laboratory on Cloud Security and Assessment Technology of Guangzhou, Guangzhou 510631, China;3.VeChain Blockchain Technology and Application Joint Laboratory, South China Normal University, Guangzhou 510631, China;4.Guangdong Maternity and Child Health Hospital, Guangzhou 511400, China)

Abstract
Objective Deep learning technique-based medicinal image processing is essential for clinical information in related to disease diagnosis, treatment, and surgical planning. Chromosome-relevant segmentation can be as one of the specific tasks for medical-based image processing. It is beneficial to prenatal diagnosis via clinical diagnosis information gathering and analysis. In recent years, an end-to-end training features-based deep learning technique has been developing intensively. Chromosome-relevant segmentation has been facilitating as well. Chromosomes can be one of the key carriers of genetic information. Chromosomes-based genetic information analysis is often employed for human genetic diseases. Chromosome images-related karyotyping analysis is a commonly-used method for diagnosing birth defects and it can be as the "gold standard" for the clinical diagnosis of genetic diseases. Chromosome segmentation is challenged for the manipulation problem in the context of chromosome karyotype analysis. It has a strong reference value for prenatal diagnosis results. However, most of chromosome-related segmentation algorithms have restricted by its heterogeneity, resulting in a lack of a screening standard for algorithms in clinical applications. To carry out more comparative experiments on the large-scale chromosome-constructed database, we develop a multiple of chromosome-essential segmentation models. Method Our database is constructed and segmented in terms of the chromosome karyotype (funded by Guangdong Maternity and Child Health Hospital). first, it consists of large-scale chromosome clinical data in relevant to 126 453 chromosome samples. Then, the publicly-available multi-chromosome-essential recognition models are selected. Finally, experiments and performance evaluation of our model is carried out in the clinical chromosome database. Result Random sampling-stratified experiment is used to divide the clinical chromosome data set into training data set (80%), validation data set (10%), and test data set (10%) totally. The models-selected are all developed in terms of Pytorch framework. The training process of the model is summarized as mentioned below:First, all models are pre-trained and migrated from the ImageNet classification task. Second, a single-stage cycle learning (1 cycle LR) learning and training method is used to balance the performance of each model in the clinical data set gradually. The batch size of all experiments is set to 32 (batch_size=32). The balanced loss function is based on the cross-entropy loss function smoothed by the mark of α=0.1. The learning rate is set to 1E-4. The hyper parameter for the maximum number of training iterations is set to 500. Moreover, the early stopping strategy will be implemented to terminate the training process if the verification loss is not decreased in five consecutive periods. Finally, our training weights can be optimized and restored for the corresponding model. Large-scale clinical-oriented chromosome data sets are beneficial for evaluating existing chromosome classification methods and improving their performance. The CirNet and MixNet models-based initial performance and classification effect are optimized on the original ResNet networks. It can strengthen the depth and width of the network, increase the number of parameters, and get a better classification level. The guarantee of the amount of data can alleviate the problem of over-fitting. The classification accuracy rate is optimized and outreached to 98.92%, but there is still a gap between the high-precision clinical applications. Conclution To develop deep learning technique-based chromosome classification and ensure its high-precision potentially, a refined network structure is required to be designed and tackled the chromosome images-related homogeneity further. The quality and quantity of chromosome data samples should be guaranteed as well.
Keywords

订阅号|日报