Visual-Text Multimodal Large Language Models | Views : 0 下载量: 236 CSCD: 0
  • Export

  • Share

  • Collection

  • Album

    • TextLLM: a document multimodal large model based on dynamic resolution

    • The latest research breakthrough proposes a dynamic resolution based multimodal document model, TextLLM, which can process high-resolution document images without the need for OCR tools, significantly improving document understanding performance.
    • Vol. 30, Issue 9, Pages: 3068-3082(2025)   

      Received:16 October 2024

      Revised:2025-01-17

      Accepted:18 February 2025

      Published:16 September 2025

    • DOI: 10.11834/jig.240608     

    移动端阅览

  • Yang Biao, Liu Yuliang, Liu Qiang, Zhu Yingying. 2025. TextLLM: a document multimodal large model based on dynamic resolution. Journal of Image and Graphics, 30(9):3068-3082 DOI: 10.11834/jig.240608.
  •  
  •  
Alert me when the article has been cited
提交

相关作者

Zhu Yingying 华中科技大学人工智能与自动化学院
Liu Qiang 武汉金山办公软件有限公司
Liu Yuliang 华中科技大学人工智能与自动化学院
Yang Biao 华中科技大学人工智能与自动化学院
Liu Yichen 武汉大学电子信息学院
Yu Lei 武汉大学电子信息学院
Yu Huai 武汉大学电子信息学院
Yang Wen 武汉大学电子信息学院

相关机构

Wuhan Kingsoft Office;Software Co., Ltd.
Electronic Information School, Wuhan University
State Key Laboratory of Multi-Modal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
School of Artificial Intelligence, University of Chinese Academy of Sciences
School of Electronic and Information Engineering, South China University of Technology
0