Visual-Text Multimodal Large Language Models | Views : 0
下载量:
562
CSCD:
0
Multimodal large model-based method for generating visual Q&A data for electronic document images
- “The latest research has broken through the technology of generating visual Q&A data for electronic documents, significantly improving the document reading performance of multimodal large-scale language models.”
- Vol. 30, Issue 9, Pages: 3083-3096(2025)
Received:16 October 2024,
Revised:2025-02-16,
Accepted:25 February 2025,
Published:16 September 2025
DOI: 10.11834/jig.240610
移动端阅览
