史彩娟, 郑远帆, 任弼娟, 孔凡跃, 段昌钰(华北理工大学)
目的 由于乳腺肿瘤病灶的隐蔽性强且极易转移等，目前一般采用医学辅助诊断(computer-aided diagnosis , CAD)来及早的发现肿瘤并进行诊断。然而，医学图像数据量少且标注昂贵，导致全监督场景下的基于深度学习的X-ray乳腺肿瘤检测方法的性能非常有限，且模型泛化能力弱；此外，噪声产生的域偏移(Domain Shift)也降低了不同环境下肿瘤检测的性能。针对上述挑战，本文提出一种单域泛化X-ray乳腺肿瘤检测方法，通过学习少量的有标注医学图像对不可预见的噪声图像进行迁移学习，缓解因有标记医学图像匮乏而导致的泛化能力弱的问题；同时避免模型的冗余训练，进一步增强模型在不同环境下的鲁棒性。方法 提出了一种单域泛化模型(Single-Domain Generalization Model, SDGM) 进行X-ray乳腺肿瘤检测，采用ResNet-50作为主干特征提取网络，设计了域特征增强模块（Domain Feature Enhancement Module, DFEM）来有效融合上采样与下采样中的全局信息以抑制噪声，然后在检测头处设计了实例泛化模块（Instance Generalization Module，IGM），对每个实例的类别语义信息进行正则化与白化处理来提升模型的泛化性能。结果 为了验证所提模型SDGM的域内泛化性能，将INbreast的单域X-ray图像作为训练集，多种域偏移的图像为测试集，实验结果表明在域内泛化场景下SDGM性能优于FCOS、Cascade-RCNN、FoveaBox、ATSS、TOOD、PVTv2-Transformer等方法，泛化性能比Baseline方法提升了9.7% mAP；在训练数据量更小的前提下，单域泛化性能优于INbreast全监督场景下的Baseline方法的性能。此外，为了进一步验证SDGM在不同数据集的域间的泛化性能，将CBIS-DDSM数据集作为训练集而多种域偏移的INbreast数据集为测试集进行实验，所提方法SDGM比Baseline方法提升了5.8%。结论 所提单域泛化模型SDGM能够有效缓解域偏移对模型性能的影响，并能够针对医学数据域未知且数量少的特点进行泛化，能够较灵活的迁移至临床实践中未知域下的噪声场景。
Single-domain generalized breast tumor detection in X-ray images
Shi Caijuan, Zheng Yuanfan, Reng Bijuan, Kong Fanyue, Duan Changyu(North China University of Science and Technology)
Objective Detecting breast tumors in X-ray images is a formidable challenge within the domain of medical image analysis, primarily attributed to the intrinsic difficulty in discerning these lesions owing to their significant concealment and propensity for metastasis. Currently, computer-aided diagnosis (CAD) plays a pivotal role in early tumor detection and diagnosis. Remarkable progress has been achieved in detecting breast tumors in X-ray images through deep learning-based object detection methods when the training and testing data consist of the same modality. However, the limited availability of medical image data, combined with the labor-intensive and professional nature of data annotation, has led to the constrained detection performance and limited generalization ability in models. In addition, the presence of domain shift in the unseen domains caused by the noises greatly significantly diminishes the performance of breast tumor detection across diverse environments. To address these issues, various studies have proposed methodologies, including domain adaptation and domain generalization. However, domain adaptation necessitates a partition between the target domain and the source domain, while domain generalization requires training the models in multiple domains. Achieving the domain division poses a formidable challenging due to the limited availability of medical data. Therefore, in response to these challenges, the single-domain methods train the model in a single domain and then generalizing the model to the unseen domains, have been proposed in recent years. These methods are well-suited for the medical data, aiding in mitigating the domain shifts. Though single-domain generalization has been widely applied in classification tasks, its application to object detection tasks remains relatively nascent due to the inherent dissimilarities between object detection and classification. By analyzing, we have discerned that for the classification tasks, each image is only one instance to be considered, simplifying the achievement of overall alignment. In contrast, the object detection tasks entail the simultaneous consideration of multiple objects within each image, which leads to mismatching of instances. Consequently, in this paper, we propose a novel instance alignment paradigm to facilitate the single-domain generalization applicable to detecting breast tumors. Method In pursuit of improving the generalization performance for robust breast tumor detection in X-ray images, we proposal a novel model termed Single-Domain Generalization Model (SDGM). SDGM is constructed upon the Baseline (RetinaNet) and employs Resnet-50 as its backbone. Two pivotal modules, namely the Instance Generalization Module (IGM) and the Domain Feature Enhancement Module (DFEM), have been developed. Firstly, the IGM is strategically positioned at the detection head to enhance the generalization performance by normalizing and whitening the category semantic information of each instance. IGM comprises N sets of 3x3 convolutions and Switchable Whitening (SW) sub-modules, widely recognized for their effectiveness in extracting instance domain-invariant features in classification tasks. Therefore, it is integrated into the classification branch at the detection head. Secondly, the DFEM is ingeniously devised to efficiently amalgamate global information from both up-sampling and down-sampling processes while mitigating the impact of noise in medical images. To counteract the noise generated by conventional convolution in spatial features, a 3x3 convolution is employed to generate a foreground mask image, which serves as the convolution offsets to guide deformable convolution for sampling. Subsequently, channel-wise attention is leveraged to selectively suppress noise within each channel. The DFEM is incorporated into the feature fusion segment of the Feature Pyramid Network (FPN) to attenuate noise during the fusion of feature maps at various scales, thereby significantly augmenting the subsequent extraction of domain-invariant features. Result To assess the efficacy of our proposed SDGM, we conducted extensive experiments using the INbreast dataset, which is single-domain generalized with multiple domains in the intra-domain. Additionally, we compared SDGM against several state-of-the-art methods. We also evaluated the inter-domain generalization performance between the CBIS-DDSM dataset and the INbreast dataset using our proposed approach. In the intra-domain single-domain generalization scenarios, SDGM consistently outperformed the Baseline method RetinaNet by an impressive 9.7% increase in mean Average Precision (mAP). Furthermore, it surpassed other one-stage anchor-free methods such as FCOS and FoveaBox, one-stage anchor-based methods like ATSS and TOOD, as well as two-stage methods like FasterRCNN and Cascade-RCNN, and even transformer-based method PVTv2. In supervised learning scenarios, SDGM, trained with only 728 images, surpassed RetinaNet, Cascade-RCNN, FoveaBox, and FCOS, all of which are trained with a larger dataset of 5148 images. This demonstrates that SDGM exhibits remarkable generalization capabilities, outperforming supervised methods with substantially less training data. Furthermore, we assessed the impact of the attention mechanism on model performance. When compared to the attention-less method TOOD, SDGM exhibited a reduction in domain shift performance loss by at least 3.6% in the single-domain generalization scenario. Additionally, when compared to methods PVTv2 and ResneSt that employ different attention mechanisms, SDGM exhibited reduced domain shift performance losses of 21.1% and 2.8%, respectively, in single-domain generalization scenarios. In inter-domain single-domain generalization scenarios, SDGM improved performance by 5.8% compared to the Baseline. These results collectively indicate that our proposed SDGM not only mitigates performance degradation but also demonstrates robustness and generalization capabilities across different datasets. Conclusion In this paper, we devise a single-domain generalization model (SDGM) for detecting breast tumors in X-ray images. SDGM uses ResNet-50 as its core architecture and incorporates two important components: the Domain Feature Enhancement Module (DFEM) and the Instance Generalization Module (IGM). DFEM helps improve the performance by effectively combining global information from different image scales and reducing the impact of image noise. Meanwhile, IGM is positioned at the detection head to enhance the generalize ability by normalizing and whitening the category information for each object. We evaluate SDGM on the INbreast and CBIS-DDSM datasets with multiple benchmarks and found it to be effective. It can handle domain shifts and perform well even with limited labeled medical data, mitigating challenges in medical image analysis. Additionally, SDGM exhibits robustness across different environmental conditions. In summary, SDGM offers a promising solution for improving breast tumor detection in X-ray images, making it a valuable effort in clinical practice.