Visual-Text Multimodal Large Language Models | Views : 0 下载量: 562 CSCD: 0
  • Export

  • Share

  • Collection

  • Album

    • Multimodal large model-based method for generating visual Q&A data for electronic document images

    • The latest research has broken through the technology of generating visual Q&A data for electronic documents, significantly improving the document reading performance of multimodal large-scale language models.
    • Vol. 30, Issue 9, Pages: 3083-3096(2025)   

      Received:16 October 2024

      Revised:2025-02-16

      Accepted:25 February 2025

      Published:16 September 2025

    • DOI: 10.11834/jig.240610     

    移动端阅览

  • Li Yuzhe, Fu Ling, Zhu Linghao, Luo Qidi, Tu Lai. 2025. Multimodal large model-based method for generating visual Q&A data for electronic document images. Journal of Image and Graphics, 30(9):3083-3096 DOI: 10.11834/jig.240610.
  •  
  •  
Alert me when the article has been cited
提交

相关文章

暂无数据

相关作者

Li Hongliang 华南理工大学电子与信息学院
Liu Yuliang 华中科技大学人工智能与自动化学院
Liao Wenhui 华南理工大学电子与信息学院
Huang Mingxin 华南理工大学电子与信息学院
Zhang Shuo 华中科技大学人工智能与自动化学院
Jin Lianwen 华南理工大学电子与信息学院;华南理工大学-珠海现代产业创新研究院

相关机构

School of Electronic and Information Engineering, South China University of Technology
SCUT-Zhuhai Institute of Modern Industrial Innovation
0