Vision-language model driven object counting

Image Understanding and Computer Vision | Views : 0 下载量: 203 CSCD: 0

Vision-language model driven object counting
“Large scale visual language models have made progress in the field of object counting, but face two major challenges: class semantic misalignment and decoder architecture limitations. Experts propose the Cross Branch Collaborative Alignment Network (CANet), which adopts a dual branch decoder architecture and visual text category alignment loss to effectively solve the above problems and achieve excellent performance on multiple benchmark datasets, providing new ideas for improving counting robustness in complex scenes.”
Vol. 31, Issue 1, Pages: 289-302(2026)
Received：03 April 2025，

Revised：2025-06-06，

Accepted：18 June 2025，

Published：16 January 2026
DOI： 10.11834/jig.250119
稿件说明：

移动端阅览

曹锋，张孝文，岳子杰，李莉，史淼晶. 2026. 视觉语言模型驱动的目标计数. 中国图象图形学报， 31(1):0289-0302 DOI： 10.11834/jig.250119.

Cao Feng， Zhang Xiaowen， Yue Zijie， Li Li， Shi Miaojing. 2026. Vision-language model driven object counting. Journal of Image and Graphics， 31(1):0289-0302 DOI： 10.11834/jig.250119.

Alert me when the article has been cited

提交

暂无数据

相关机构

School of Electronic and Information Engineering， Tongji University

Postal code：100190
Tel：010-58887035/58887030/58887418 Email：jig@aircas.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备05080539号-4 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰