TextLLM： a document multimodal large model based on dynamic resolution

Visual-Text Multimodal Large Language Models | Views : 0 下载量: 236 CSCD: 0

TextLLM： a document multimodal large model based on dynamic resolution
“The latest research breakthrough proposes a dynamic resolution based multimodal document model, TextLLM, which can process high-resolution document images without the need for OCR tools, significantly improving document understanding performance.”
Vol. 30, Issue 9, Pages: 3068-3082(2025)
Received：16 October 2024，

Revised：2025-01-17，

Accepted：18 February 2025，

Published：16 September 2025
DOI： 10.11834/jig.240608
稿件说明：

移动端阅览

杨彪，刘禹良，刘强，朱盈盈. 2025. TextLLM：基于动态分辨率的文档多模态大模型. 中国图象图形学报， 30(9):3068-3082 DOI： 10.11834/jig.240608.

Yang Biao， Liu Yuliang， Liu Qiang， Zhu Yingying. 2025. TextLLM： a document multimodal large model based on dynamic resolution. Journal of Image and Graphics， 30(9):3068-3082 DOI： 10.11834/jig.240608.

Alert me when the article has been cited

提交

相关机构

Wuhan Kingsoft Office；Software Co.， Ltd.

Electronic Information School， Wuhan University

State Key Laboratory of Multi-Modal Artificial Intelligence Systems， Institute of Automation， Chinese Academy of Sciences

School of Artificial Intelligence， University of Chinese Academy of Sciences

School of Electronic and Information Engineering， South China University of Technology

Postal code：100190
Tel：010-58887035/58887030/58887418 Email：jig@aircas.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备05080539号-4 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰