Advancements in 3D vision understanding using multimodal large language models

Views : 0 下载量: 6620 CSCD: 0

Advancements in 3D vision understanding using multimodal large language models
“Three dimensional visual perception and understanding have made significant progress in fields such as robot navigation and autonomous driving. The fusion of multimodal large models with 3D data presents unique advantages, paving the way for the development of spatial intelligence.”
Vol. 30, Issue 6, Pages: 1744-1791(2025)
Received：29 September 2024，

Revised：2024-12-22，

Published：16 June 2025
DOI： 10.11834/jig.240588
稿件说明：

移动端阅览

冯明涛，沈军豪，武子杰，彭伟星，钟杭，郭裕兰，舒祥波，张辉，董伟生，王耀南. 2025. 多模态大模型驱动的三维视觉理解技术前沿进展. 中国图象图形学报， 30(6):1744-1791 DOI： 10.11834/jig.240588.

Feng Mingtao， Shen Junhao， Wu Zijie， Peng Weixing， Zhong Hang， Guo Yulan， Shu Xiangbo， Zhang Hui， Dong Weisheng， Wang Yaonan. 2025. Advancements in 3D vision understanding using multimodal large language models. Journal of Image and Graphics， 30(6):1744-1791 DOI： 10.11834/jig.240588.

Alert me when the article has been cited

提交

相关机构

National University of Defense Technology

Department of Automation， Tsinghua University

Department of Computer Science and Engineering， University of California，， La Jolla

Institute of Computing Technology， Chinese Academy of Sciences

School of Intelligence Science and Technology， Peking University

Postal code：100190
Tel：010-58887035/58887030/58887418 Email：jig@aircas.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备05080539号-4 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰