Image Understanding and Computer Vision | Views : 0 下载量: 203 CSCD: 0
  • Export

  • Share

  • Collection

  • Album

    • Vision-language model driven object counting

    • Large scale visual language models have made progress in the field of object counting, but face two major challenges: class semantic misalignment and decoder architecture limitations. Experts propose the Cross Branch Collaborative Alignment Network (CANet), which adopts a dual branch decoder architecture and visual text category alignment loss to effectively solve the above problems and achieve excellent performance on multiple benchmark datasets, providing new ideas for improving counting robustness in complex scenes.
    • Vol. 31, Issue 1, Pages: 289-302(2026)   

      Received:03 April 2025

      Revised:2025-06-06

      Accepted:18 June 2025

      Published:16 January 2026

    • DOI: 10.11834/jig.250119     

    移动端阅览

  • Cao Feng, Zhang Xiaowen, Yue Zijie, Li Li, Shi Miaojing. 2026. Vision-language model driven object counting. Journal of Image and Graphics, 31(1):0289-0302 DOI: 10.11834/jig.250119.
  •  
  •  
Alert me when the article has been cited
提交

相关文章

暂无数据

相关作者

Cao Feng 浙江省轨道交通运营管理集团有限公司
Zhang Xiaowen 同济大学电子与信息工程学院,上海 嘉定
Yue Zijie 同济大学电子与信息工程学院,上海 嘉定
Li Li 同济大学电子与信息工程学院,上海 嘉定
Shi Miaojing 同济大学电子与信息工程学院,上海 嘉定

相关机构

School of Electronic and Information Engineering, Tongji University
0