Visual-Text Multimodal Large Language Models | Views : 0 下载量: 523 CSCD: 0
  • Export

  • Share

  • Collection

  • Album

    • TextLLM: a document multimodal large model based on dynamic resolution

    • The latest research breakthrough proposes a dynamic resolution based multimodal document model, TextLLM, which can process high-resolution document images without the need for OCR tools, significantly improving document understanding performance.
    • Vol. 30, Issue 9, Pages: 3068-3082(2025)   

      Received:16 October 2024

      Revised:2025-01-17

      Accepted:18 February 2025

      Published:16 September 2025

    • DOI: 10.11834/jig.240608     

    移动端阅览

  • Yang Biao, Liu Yuliang, Liu Qiang, Zhu Yingying. 2025. TextLLM: a document multimodal large model based on dynamic resolution. Journal of Image and Graphics, 30(9):3068-3082 DOI: 10.11834/jig.240608.
  •  
  •  
Alert me when the article has been cited
提交

相关作者

Liu Qiang 武汉金山办公软件有限公司
Guo Yulan 中山大学深圳校区电子与通信工程学院
Zhang Ye 中山大学深圳校区电子与通信工程学院
Li Haoran 中山大学深圳校区电子与通信工程学院
Liu Yan 中山大学深圳校区电子与通信工程学院
Liu Xuefan 中山大学深圳校区电子与通信工程学院
Liu Yichen 武汉大学电子信息学院
Yu Lei 武汉大学电子信息学院

相关机构

Wuhan Kingsoft Office;Software Co., Ltd.
School of Electronic and Communication Engineering, Sun Yat-sen University
Electronic Information School, Wuhan University
State Key Laboratory of Multi-Modal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
0