Current Issue Cover

李一宸, 陈大力, 郭丁豪, 孙羽(东北大学信息科学与工程学院)

摘 要
目的 及时准确的乳腺肿瘤诊断可以提高患者的生存率。B型超声(B-mode Ultrasound, B-US)含有病灶形态、区域大小等空间信息,对比增强超声(Contrast-enhanced Ultrasound, CEUS)含有病灶区血管分布空间信息和造影剂流经病灶区过程中的时间信息,综合考虑两种模态信息有助于提高乳腺肿瘤诊断的准确性。然而,目前大多数模型只关注B-US的特征提取,忽视了CEUS特征的学习和上述双模态信息的融合处理。为解决上述问题,本文提出了一个融合时空特征与时间约束的双模态乳腺肿瘤诊断模型(Spatio-temporal Feature and Temporal-constrained Model, STFTCM)。方法 首先,基于双模态信息的数据特点,采用异构双分支网络学习B-US和CEUS包含的时空特征。然后,设计时间注意力损失函数引导CEUS分支关注造影剂流入病灶区的时间窗口,从该窗口期内提取CEUS特征。最后,借助特征融合模块实现双分支网络之间的横向连接,通过将B-US特征作为CEUS分支补充信息的方式,完成双模态特征融合。结果 在收集到的数据集上进行对比实验,STFTCM预测的正确率、敏感性、宏平均F1和AUC(Area Under the Curve)指标均表现优秀,其中预测正确率达88.2%,领先于其他先进模型。消融实验中,时间注意力约束将模型预测正确率提升5.8%,同时特征融合使得模型诊断正确率相较于单分支模型至少提升2.9%。结论 本文提出的STFTCM能有效地提取并融合处理B-US和CEUS双模态信息,给出准确的诊断结果。同时,时间注意力约束和特征融合模块可以显著地提升模型性能。
Integrating spatiotemporal features and temporal constraints for dual-modal breast tumor diagnosis

Li Yichen, Chen Dali, Guo Dinghao, Sun Yu(College of Information Science and Engineering, Northeastern University)

Abstract: Objective Breast cancer ranks first in the incidence of cancer among women worldwide, impacting the health of the female population. Timely diagnosis of breast tumor can offer better treatment opportunities for patients. B-mode ultrasound (B-US) imaging contains rich spatial information such as lesion size and morphology. Moreover, due to its advantages of low cost and high safety, it is widely used in breast tumor diagnosis. On this basis, with the advancement of deep learning technology, some deep learning models have been applied to computer-aided diagnosis of breast tumor diagnosis based on B-US to assist doctors. However, diagnosis based solely on B-US imaging results in lower specificity. And the performance of models trained exclusively on B-US is limited by the singular modality of information source. Contrast-enhanced ultrasound (CEUS) can provide a second modality of information on top of B-US to improve diagnostic accuracy. CEUS contains rich spatiotemporal information, such as brightness enhancement and vascular distribution in the lesion area, by injecting contrast agents intravenously and capturing the information during the time window when the contrast agent flows into the lesion area. Considering the B-US and CEUS dual-mode information comprehensively can enhance diagnostic accuracy. In order to effectively utilize dual-modal data for breast tumor diagnosis, a model integrating spatiotemporal features and temporal-constrained (STFTCM) for dual-modality breast tumor diagnosis is pro-posed. Method STFTCM primarily comprises a heterogeneous dual-branch network, a temporal attention constraint module, and feature fusion modules. Based on the characteristics of dual-mode data information dimensions, STFTCM adopts a heterogeneous dual-branch structure to extract dual-mode feature separately. For the B-US branch, B-US consists of spatial features within the two-dimensional frames of the video and inter-frame transformations are not prominent. Considering that training 3D convolutional networks on a small dataset tends to result in overfitting due to the larger number of parameters compared to 2D convolutional networks, a 2D network, ResNet-18, is used as the backbone network for feature extraction from a single frame extracted from the video. In contrast, CEUS video frames undergo noticeable transformations during the time window when the contrast agent flows through the lesion area, containing rich spatiotemporal information. Thus a 3D network, R(2+1)D-18, is used as the backbone network for CEUS branch. To ensure the feature maps extracted from corresponding layers of the dual-branch network have the same dimensions for subsequent dual-mode fusion, there is adjustment for the structure based on the backbone network. Based on the aforementioned CEUS branch, since the spatiotemporal information in CEUS mainly resides within the time window when the contrast agent flows into the lesion area, guiding the model to focus on this time segment facilitates better learning of CEUS features on a small dataset. To address this, a temporal attention loss function is proposed. Through analysis of temporal knowledge of CEUS video, the temporal loss function first determines the temporal attention boundary based on the first-order difference of the discrete sequence of CEUS frames luminance values and then establishes a temporal mask. Subsequently, the temporal attention loss function guides the updating of the parameters of the R(2+1)D temporal convolutional kernels in the CEUS branch based on the temporal mask, thereby directing the model to focus on the information during the period when the contrast agent flows into the lesion area. Furthermore, to improve prediction accuracy by considering information from both B-US and CEUS, a feature fusion module is introduced to fuse dual-mode information. To control model parameter size, a separate third feature fusion branch is not set; instead, the feature fusion module facilitates feature fusion between the dual branches through lateral connections. Incorporate B-US spatial information as supplementary data into the CEUS branch. The feature fusion module comprises a spatial feature fusion module and an identity mapping branch. The spatial feature fusion module combines two-dimensional spatial feature maps of B-US and three-dimensional spatiotemporal feature maps of CEUS, while the identity mapping branch prevents loss of the original CEUS features during the fusion process. Result To explore the performance of STFTCM and the effectiveness of the temporal attention constraint and feature fusion modules, comparative experiments, structural ablation experiments, etc., are conducted. Experimental data are obtained from the Shengjing Hospital of China Medical University, comprising 332 ultrasound contrast videos categorized into benign tumors, malignant tumors, and inflammations, with 101, 102, and 129 instances respectively. Accuracy, sensitivity, specificity, macro-average F1, and area under the curve (AUC) are used as model performance evaluation metrics. In comparative experiments, STFTCM achieves an accuracy of 0.882, with scores of 0.909, 0.870, 0.883, and 0.952 for the other four metrics, outperforming other advanced models. In single-branch model comparison experiments, both the B-US and CEUS branches of STFTCM perform better than other advanced models. Comparative experiments between dual-branch and single-branch models demonstrate the excellent performance of STFTCM. Structural ablation experiment results show that temporal attention loss con-straint improved prediction accuracy by 5.8 percentage points, and dual-modal feature fusion enhanced prediction accuracy by at least 2.9 percentage points compared to unimodal predictions, confirming the effectiveness of the temporal attention constraint and feature fusion modules in enhancing model performance. Additionally, visualization of model attention through class activation maps (CAM) validates that the temporal attention constraint improves the model"s attention in the temporal dimension, guiding better extraction of spatiotemporal information contained in CEUS. Results from experiments related to the feature fusion module demonstrate that the addition of the identity mapping branch could improve prediction accuracy by 2.9 percentage points, further confirming the rationality of the feature fusion module"s structural design. Conclusion STFTCM, designed based on prior knowledge, is demonstrated excellent performance for breast tumor diagnosis. The heterogeneous dual-branch structure designed based on the characteristics of dual-mode data effectively extracts B-US and CEUS dual-mode features while reducing model parameters for better parameter optimization on small dataset. The temporal attention loss function constrains the model"s attention in the temporal dimension, guiding the model to focus on information from the time window when the contrast agent flows into the lesion area. Furthermore, the feature fusion module effectively implements lateral connections between dual-branch networks to fuse dual-mode features.