Image Understanding and Computer Vision | Views : 0
下载量:
203
CSCD:
0
Vision-language model driven object counting
- “Large scale visual language models have made progress in the field of object counting, but face two major challenges: class semantic misalignment and decoder architecture limitations. Experts propose the Cross Branch Collaborative Alignment Network (CANet), which adopts a dual branch decoder architecture and visual text category alignment loss to effectively solve the above problems and achieve excellent performance on multiple benchmark datasets, providing new ideas for improving counting robustness in complex scenes.”
- Vol. 31, Issue 1, Pages: 289-302(2026)
Received:03 April 2025,
Revised:2025-06-06,
Accepted:18 June 2025,
Published:16 January 2026
DOI: 10.11834/jig.250119
移动端阅览
