Current Issue Cover
具有细粒度感受野的多尺度融合口腔模型分割

周新文1, 朱洋1, 葛峻沂2, 潘钱家2, 魏然1, 顾敏2(1.常州大学;2.苏州大学附属第三医院,常州市第一人民医院)

摘 要
摘要:目的 对于从口内扫描点云模型上精确分割牙齿这一在计算机辅助牙科治疗中重要的任务,存在手动执行耗时且繁琐的问题。近年来,计算机视觉领域涌现出一些端到端的方法,用于实现三维形状分割。然而目前为止,大多数方法没有注意到口腔分割需要网络具有更加细粒度的感受野,因此分割精度仍然受到限制。为了解决该问题,设计了一个端到端的具有细粒度感受野的全自动牙齿分割网络,称为TRNet,用于在未加工的口内扫描点云模型上自动分割牙齿。方法 首先,TRNet使用了具有细粒度感受野的编码器,其基于多尺度融合从不同的尺度提取到更全面的口腔模型特征,并通过更适合口腔模型分割的细粒度分组查询半径以及具有相对坐标归一化的特征提取层来提升分割性能。其次,TRNet采用了基于层级连接的特征嵌入方式,其使网络学习到口腔模型中由各个局部区域到覆盖更大范围空间的关键特征,使特征提取更全面,提升了网络的分割精度。同时,TRNet使用了基于软性注意力机制的特征融合方式,使网络更好地从融合特征中关注到口腔模型的关键信息。结果 使用由口内扫描仪获取的患者口内扫描点云模型数据集评估了TRNet。经过五折交叉验证的实验结果中,TRNet的总体准确率(OA)达到了97.015±0.096%,平均交并比(mIoU)达到了92.691±0.454%,显著优于现有方法。结论 实验结果表明,提出的具有细粒度感受野的多尺度融合口腔分割模型在口内扫描点云模型上取得了较好表现,提高了网络对于口腔模型的分割能力,使点云分割结果更准确。
关键词
Dental model segmentation network with fine-grained receptive fields and multiscale fusion

Zhou Xinwen, Zhu Yang1, Ge Junyi2, Pang Qianjia2, Wei Ran1, Gu Min2(1.Changzhou University;2.The Third Affiliated Hospital of Soochow University, The First People’s Hospital of Changzhou)

Abstract
Abstract: Objective Dental computer-aided therapy relies on the use of dental models to aid dentists in their practice. One of the most fundamental tasks is the automated segmentation of teeth using point cloud data obtained from intra-oral scanners (IOS). The precise segmentation of each individual tooth in this procedure is crucial as it provides vital information for a variety of subsequent tasks. These segmented dental models facilitate customized treatment planning and modeling, thus providing extensive assistance in carrying out further treatments. However, automated segmentation of individual teeth from dental models faces three significant challenges. Firstly, the indistinct boundary between teeth and gums poses difficulties in segmentation based solely on geometric features. Secondly, factors such as occlusion during scanning can lead to suboptimal results, particularly in posterior dental regions, further complicating the segmentation process. Lastly, teeth often exhibit complex anomalies in patients, including crowding, missing teeth, and misalignment issues, which further complicate the task of accurate segmentation. To tackle these challenges, two conventional methods have been proposed for segmenting teeth in images got from IOS scanners. The first method employs a projection-based approach, wherein a 3D dental scan image is first projected into a 2D space, segmentation is then performed in 2D space, and the result is remapped back into the 3D space. The second method adopts a geometry-based approach, typically utilizing geometric attributes like surface curvature, geodesic information, harmonic fields, and other geometric properties to distinguish tooth structures. However, these methods are not fully automated and rely on domain-specific knowledge and experience. Moreover, the predefined low-level attributes used by these methods lack robustness when dealing with the complex appearance of patient"s teeth. Considering the impactful application of convolutional neural networks (CNN) in computer vision and medical image processing, several deep learning methods rooted in CNN have been introduced. Some existing methods directly extract translation-invariant depth geometric features from 3D point cloud data. However, these methods suffer from a lack of necessary receptive field for fine-grained tasks like dental model segmentation. Moreover, the network structure exhibits redundancy, neglecting crucial details of the dental models. To address these issues, a fully automatic tooth segmentation network named TRNet is proposed in this paper. TRNet is capable of automatically segmenting teeth on unprocessed intra-oral scanned point cloud models. Method An end-to-end 3D point cloud-based multi-scale fusion dental model segmentation method, TRNet, has been proposed. In this method, an encoder with a fine-grained receptive field is employed to tackle the challenges posed by the small size of each tooth within the dental model and the lack of distinct features between teeth and gums. In the dental model, each tooth within the dental model is relatively small in comparison to the entire dental model, and the boundaries between teeth and gums lack distinct features. Consequently, a fine-grained receptive field is essential for extracting features from the dental model. The network adopts a smaller radius for querying the neighborhood, narrowing the receptive field and enabling the network to focus on more detailed features. Additionally, downsampling can lead to uneven density of point cloud, causing the network trained on sparse point cloud to struggle in recognizing fine-grained local structures. To address this issues, multiscale feature fusion coding is implemented. Since the encoder uses a smaller query radius to create a fine-grained receptive field, the relative coordinates become relatively small. Consequently, the network needs to learn larger weights to operate on these relative coordinates, making network optimization more challenging. TRNet normalizes the relative coordinates in the feature extraction layer to make network optimization easier and enhance segmentation performance. The network also employs a more efficient decoder. Previous segmentation methods often utilized the U-Net structure, which incorporated jump connections for multi-level feature aggregation between the input features of the cascaded decoder and the outputs of the corresponding layer encoder. However, this top-down propagation was inefficient for feature aggregation. The decoding approach used by TRNet involves directly combining features output from all cascade encoders, which allows the network to learn the importance of each cascade. What’s more, discrepancies in scales or dimensions of the features represented by fused information in the network might introduce unwanted bias during the fusion process. To address this issue and ensure that the network focuses on crucial information within the fused features, a soft attention mechanism is incorporated into the fusion process. Specifically, a soft attention operation is performed on the newly combined features after their connection, enabling the network to adaptively balance discrepancies of different scales or levels in the propagated features. Result The dataset was compiled comprising dental models from numerous patients, showcasing various irregular tooth shapes such as crowding, misalignment, and underdeveloped teeth. To establish labeled values, an experienced dentist meticulously segmented and annotated the teeth. The dataset was subsequently divided randomly into two subsets, with 146 models allocated for training and 20 models reserved for testing. To enhance the diversity of the training set, data augmentation techniques were employed, involving random panning and scaling. In each iteration, intra-oral scan images were shifted by a randomly selected value within the range of [-0.1, 0.1] and scaled by a randomly chosen magnification within the range of [0.8, 1.25], thereby generating new training data. In the experimental results performed 5-fold cross-validation, TRNet achieved an overall accuracy (OA) of 97.015±0.096% and a mean intersection over union (mIoU) of 92.691±0.454%, significantly outperforming existing methods. Conclusion An end-to-end deep learning network named TRNet has been introduced for the automatic segmentation of teeth in 3D dental images acquired from intra-oral scanners. We implemented an encoder with fine-grained receptive fields, enhancing the local feature extraction capabilities essential for dental model segmentation. Additionally, a decoder based on hierarchical connections was employed, enabling the network to decode more efficiently by learning the significance of each level. This refinement significantly improves the precision of dental model segmentation. Moreover, a soft attention mechanism was integrated into the feature fusion process, enabling the network to focus on key information within dental model features. The experimental results indicate that TRNet demonstrated excellent performance on intra-oral scanned point cloud models. It enhanced the network"s ability to segment dental models, leading to more accurate point cloud segmentation results.
Keywords

订阅号|日报