Multimodal large model-based method for generating visual Q&amp;A data for electronic document images

Visual-Text Multimodal Large Language Models | Views : 0 下载量: 562 CSCD: 0

Multimodal large model-based method for generating visual Q&A data for electronic document images
“The latest research has broken through the technology of generating visual Q&A data for electronic documents, significantly improving the document reading performance of multimodal large-scale language models.”
Vol. 30, Issue 9, Pages: 3083-3096(2025)
Received：16 October 2024，

Revised：2025-02-16，

Accepted：25 February 2025，

Published：16 September 2025
DOI： 10.11834/jig.240610
稿件说明：

移动端阅览

黎宇哲，伏凌，朱泠皞，罗琪頔，涂来. 2025. 多模态大模型面向电子文档视觉问答的数据生成. 中国图象图形学报， 30(9):3083-3096 DOI： 10.11834/jig.240610.

Li Yuzhe， Fu Ling， Zhu Linghao， Luo Qidi， Tu Lai. 2025. Multimodal large model-based method for generating visual Q&A data for electronic document images. Journal of Image and Graphics， 30(9):3083-3096 DOI： 10.11834/jig.240610.

Alert me when the article has been cited

提交

暂无数据

相关机构

School of Electronic and Information Engineering， South China University of Technology

SCUT-Zhuhai Institute of Modern Industrial Innovation

Postal code：100190
Tel：010-58887035/58887030/58887418 Email：jig@aircas.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备05080539号-4 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰

Multimodal large model-based method for generating visual Q&A data for electronic document images