乳腺超声双模态数据的协同约束网络
Cooperative suppression network for bimodal data inbreast cancer classification
- 2020年25卷第10期 页码:2218-2228
收稿:2020-05-31,
修回:2020-7-7,
录用:2020-7-14,
纸质出版:2020-10
DOI: 10.11834/jig.200246
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-05-31,
修回:2020-7-7,
录用:2020-7-14,
纸质出版:2020-10
移动端阅览
目的
2
通过深度学习对乳腺癌早期的正确诊断能大幅提高患者生存率。现阶段大部分研究者仅采用B型超声图像作为实验数据,但是B型超声自身的局限性导致分类效果难以提升。针对该问题,提出了一种综合利用B型超声和超声造影视频来提高分类精度的网络模型。
方法
2
针对B型超声图像及造影视频双模态数据的特性设计了一个双分支模型架构。针对传统提取视频特征中仅使用单标签的不足,制定了病理多标签预训练。并设计了一种新的双线性协同机制,能更好地融合B型超声和超声造影的特征,提取其中的病理信息并抑制无关噪声。
结果
2
为了验证提出方法的有效性,本文设计了3个实验,前两个实验分别对B型超声和超声造影进行预训练。在造影分支,使用根据医学领域设计的病理多标签进行预训练。最后,采用前两个实验的预训练模型进行第3个实验,相比单独使用B型超声图像精度提升6.5%,比单用超声造影视频精度提高7.9%。同时,在使用双模态数据里,本文方法取得了最高精度,相比排名第2的成绩提高了2.7%。
结论
2
本文提出的协同约束网络,能对不同模态的数据进行不同处理,以提取出其中的病理特征。一方面,多模态数据确实能从不同角度展示同一个病灶区,为分类模型提供更多的病理特征,进而提高模型的分类精度。另一方面,合适的融合方式也至关重要,能最大程度地利用特征并抑制噪声。
Objective
2
Computer-aided breast cancer diagnosis is a fundamental problem in the field of medical imaging. Correct diagnosis of breast cancer through deep learning can immensely improve the patients' survival rate. At present
most researchers only use B-mode ultrasound images as experimental data
but the limitation of B-mode ultrasound data makes it difficult to achieve a high classification accuracy. With the development of medical images
contrast-enhanced ultrasound (CEUS) video can provide accurate pathological information by observing the dynamic enhancement of the lesion area in temporal sequence. In view of the above ultrasound image problems
this paper proposes a network model that can comprehensively utilize B-mode ultrasound video data and CEUS video data to improve the classification accuracy.
Method
2
First
a dual-branch model architecture is designed on the basis of the characteristics of two-stream structure and dual-modal data. One branch uses a frame of B-mode ultrasound video data and Resnet34 network model to extract pathological features. The other branch uses ultrasound contrast data and R (2+1) network model to extract temporal sequence information. Second
pathological multilabel pretraining is designed in this branch using 10 pathological information in CEUS video data because of the shortcoming of traditional video feature extraction. After the two-branch network
the characteristics of B-made ultrasound data and CEUS video data are obtained. We perform bilinear fusion on the obtained features to better integrate the features of B-mode ultrasound and CEUS. To extract pathological information and suppress irrelevant noise
the extracted and fused features from the two-branch network are processed using the attention mechanism to obtain the attention weight of the corresponding feature
and the corresponding weight is applied to the original feature. Weighted ultrasound and contrast features are obtained. Finally
the features obtained through the attention mechanism are bilinearly fused to obtain the final features.
Result
2
This article designed three experiments
where the first two experiments are pretraining on B-mode ultrasound and CEUS to verify the effectiveness of the proposed method and select the network with the strongest feature extraction ability for ultrasound data. In the B-mode ultrasound data pretraining experiment
the classic VGG(visual geometry group)16-BN(batch normalization)
VGG19-BN
ResNet13
ResNet34
and ResNet50 networks were selected as the backbone network of the ultrasound branch for training to select the network with the strongest extraction ability for ultrasound images. The final classification results of each network are 74.2%
75.6%
80.5%
81.0%
and 92.1%. Considering that the accuracy of the Resnet50 network in the test set is only 79.3%
which is relatively different from the accuracy of the training set and resulting in serious overfitting
the Resnet34 network is used as the backbone network of B-mode ultrasound data. In the pretraining experiment of the CEUS branch
the current mainstream P3D
R3D
CM3
and R (2+1) D convolutional networks are used as the backbone network of the CEUS branch for training. The final classification results of each network are 75.2%
74.6 %
74.1%
and 78.4%
and the R (2+1) D network with better results in the experiment is selected as the backbone network of the CEUS branch. Pretraining using pathological multilabels is designed in accordance with the medical field. The accuracy of the experiment combining the two data is improved by 6.5% compared with the use of B-mode ultrasound images alone and improved by 7.9% compared with the single-use CEUS video. At the same time
the proposed method achieves the highest accuracy in the use of bimodal data
which increases by 2.7% compared with the highest score.
Conclusion
2
The proposed cooperative suppression network can process different modal data differently to extract the pathological features. On the one hand
multimodal data can certainly display the same lesion area from different angles
providing many pathological features for the classification model
thereby improving its classification accuracy. On the other hand
a proper fusion method is crucial because it can maximize the use of features and suppress noise.
Al-Dhabyani W, Gomaa M, Khaled H and Fahmy A. 2019. Deep learning approaches for data augmentation and classification of breast masses using ultrasound images. International Journal of Advanced Computer Science and Applications, 10(5):618-627[DOI:10.14569/IJACSA.2019.0100579]
Cantisani V, Grazhdani H, Fioravanti C, Rosignuolo M, Calliada F, Messineo D, Bernieri M G, Redler A, Catalano C and Ambrosio F D. 2014. Liver metastases:contrast-enhanced ultrasound compared with computed tomography and magnetic resonance. World Journal of Gastroenterology, 20(29):9998-10007[DOI:10.3748/wjg.v20.i29.9998]
Cao Z T, Duan L X, Yang G W, Yue T and Chen Q. 2019. An experimental study on breast lesion detection and classification from ultrasound images using deep learning architectures. BMC Med Imaging, 19(1):51[DOI:10.1186/s12880-019-0349-x]
Carreira J and Zisserman A. 2017. Quo Vadis, action recognition? A new model and the kinetics dataset//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 6299-6308[ DOI:10.1109/CVPR.2017.502 http://dx.doi.org/10.1109/CVPR.2017.502 ]
Chen W Q, Zheng R S, Baade P D, Zhang S W, Zeng H M, Bray F, Jemal A, Yu X Q and He J. 2016. Cancer statistics in China, 2015. CA:Cancer Journal for Clinicians, 66(2):115-132[DOI:10.3322/caac.21338]
Feichtenhofer C, Pinz A and Zisserman A. 2016. Convolutional two-stream network fusion for video action recognition//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 1933-1941[ DOI:10.1109/CVPR.2016.213 http://dx.doi.org/10.1109/CVPR.2016.213 ]
Girdhar R, Ramanan D, Gupta A, Sivic J and Russell B. 2017. ActionVLAD: learning spatio-temporal aggregation for action classification//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 971-980[ DOI:10.1109/CVPR.2017.337 http://dx.doi.org/10.1109/CVPR.2017.337 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hooley R J, Scoutt L M and Philpotts L E. 2013. Breast ultrasonography:state of the art. Radiology, 268(3):642-659[DOI:10.1148/radiol.13121606]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8):2011-2023[DOI:10.1109/TPAMI.2019.2913372]
Lin T Y, RoyChowdhury A and Maji S. 2015. Bilinear CNN models for fine-grained visual recognition//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: Chile: IEEE: 1449-1557[ DOI:10.1109/iccv.2015.170 http://dx.doi.org/10.1109/iccv.2015.170 ]
Lu J S, Yang J W, Batra D and Parikh D. 2016. Hierarchical question-image co-attention for visual question answering//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: ACM: 289-297
Mohammed M A, Al-Khateeb B, Rashid A N, Ibrahim D A, Ghani M K A and Mostafa S A. 2018. Neural network and multi-fractal dimension features for breast cancer classification from ultrasound images. Computers and Electrical Engineering, 70:871-882[DOI:10.1016/j.compeleceng.2018.01.033]
Qi X F, Zhang L, Chen Y, Pi Y, Chen Y, Lv Q and Yi Z. 2019. Automated diagnosis of breast ultrasonography images using deep neural networks. Medical Image Analysis, 52:185-198[DOI:10.1016/j.media.2018.12.006]
Qiu Z F, Yao T and Mei T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: Italy: IEEE: 5533-5541[ DOI:10.1109/ICCV.2017.590 http://dx.doi.org/10.1109/ICCV.2017.590 ]
Simonyan K and Zisserman A. 2014. Two-stream convolutional networks for action recognition in videos//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: USA: ACM: 568-576
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 1-9[ DOI:10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Tran D, Wang H, Torresani L, Ray J, LeCun Y and Paluri M. 2018. A closer look at spatiotemporal convolutions for action recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6450-6459[ DOI:10.1109/CVPR.2018.00675 http://dx.doi.org/10.1109/CVPR.2018.00675 ]
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 6450-6458[ DOI:10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]
Wang L M, Xiong Y J, Wang Z, Qiao Y, Lin D H, Tang X O and Van Gool L. 2016. Temporal segment networks: towards good practices for deep action recognition//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 20-36[ DOI:10.1007/978-3-319-46484-8_2 http://dx.doi.org/10.1007/978-3-319-46484-8_2 ]
Wild C P. 2014. International agency for research on cancer//Wexler P, ed. Encyclopedia of Toxicology. Amsterdam: Academic Press: 1067-1069[ DOI:10.1016/B978-0-12-386454-3.00402-4 http://dx.doi.org/10.1016/B978-0-12-386454-3.00402-4 ]
Wubulihasimu M, Maimaitusun M, Xu X L, Liu X D and Luo B M. 2018. The added value of contrast-enhanced ultrasound to conventional ultrasound in differentiating benign and malignant solid breast lesions:a systematic review and meta-analysis. Clinical Radiology, 73(11):936-943[DOI:10.1016/j.crad.2018.06.004]
Zhou B L, Andonian A, Oliva A and Torralba A. 2018. Temporal relational reasoning in videos//Proceedings of the 15th European Conference on Computer Vision. Munich: Springer: 803-818[ DOI:10.1007/978-3-030-01246-5_49 http://dx.doi.org/10.1007/978-3-030-01246-5_49 ]
相关文章
相关作者
相关机构
京公网安备11010802024621