端到端智能图像视频编码的发展回顾与前沿展望
A review and frontier perspectives on end-to-end learned image and video coding
- 2026年 页码:1-23
收稿:2025-12-12,
修回:2025-12-31,
录用:2026-01-19,
网络出版:2026-01-20
DOI: 10.11834/jig.250627
移动端阅览

浏览全部资源
扫码关注微信
收稿:2025-12-12,
修回:2025-12-31,
录用:2026-01-19,
网络出版:2026-01-20,
移动端阅览
图像与视频编码及相应标准自诞生以来,一直支撑着点播、直播、视频会议等核心多媒体服务。过去三十余年,主流技术路线围绕规则驱动的模块化工具(如变换、预测、熵编码、环路滤波等)的精细化设计与协同优化展开,并借助标准化组织形成生态。近十年,随着深度学习表征能力、公共数据集累积、以及高效训练/推理框架的成熟,端到端智能编码技术快速迭代,在若干测试集与应用场景中展现出超越传统标准的压缩性能。本报告围绕图像编码,第一部分概述端到端智能编码主流框架演化主线;第二部分阐述率失真性能指标之外的可实用性功能,包括可变码率与码率控制、模型量化与鲁棒性;第三部分总结智能编码纳入/影响标准化进程的努力与现状; 第四部分探讨从智能图像编码到智能视频编码的进一步拓展。希望本文能够为研究者与工程实践者提供系统化的思考视角,促进智能图像视频编码方法在产业级场景中的有序落地。
For over three decades, image and video coding technologies and their associated international standards have served as the foundational compression engines underpinning core internet-scale multimedia services, ranging from on-demand streaming and live broadcasting to real-time video conferencing and social media sharing. Traditional approaches have predominantly followed a rule-driven, modular paradigm where carefully engineered components—such as intra and inter prediction, block-based transforms like DCT and DWT, scalar quantization, entropy coding, and in-loop filtering—are jointly optimized under classical Rate-Distortion (R-D) theory. This methodology, refined through successive generations of standards including JPEG, H.26x, MPEG, and AVS, has achieved remarkable efficiency and interoperability through the coordinated efforts of standardization bodies. However, the past decade has witnessed a paradigm shift catalyzed by the rapid advancement of deep learning, leading to the emergence of end-to-end learned image and video compression. Empowered by expressive neural architectures, large-scale public datasets, and mature training ecosystems, end-to-end trainable systems have demonstrated R-D performance that consistently surpasses conventional codecs on benchmark datasets. These learned systems are primarily built upon Variational Autoencoders (VAEs), which replace the handcrafted rules of traditional pipelines with a unified differentiable framework. In this architecture, an encoder utilizes analysis transforms to map image data into compact latent representations, while a decoder employs synthesis transforms to reconstruct the image. Unlike linear transforms in traditional coding, these transforms leverage powerful neural networks, evolving from early convolutional neural networks (CNNs) to advanced architectures incorporating attention mechanisms, Transformers, and Mamba-based state-space models. A critical challenge in this framework is the non-differentiable nature of quantization. To enable end-to-end optimization, methods such as Additive Uniform Noise (AUN) are used during training to approximate quantization errors while maintaining differentiability, or Straight-Through Estimators (STE) are employed to pass gradients directly through quantization layers. While uniform quantization remains standard, recent advancements explore vector quantization and non-uniform quantization strategies to further refine feature representation. The core of compression efficiency in these systems lies in entropy modeling, which estimates the probability distribution of latent variables to minimize the bitrate. This field has evolved significantly from early factorized models that assumed statistical independence among latents. The introduction of the hyperprior structure, which utilizes auxiliary latent variables to model the spatial distribution parameters of the primary latents, marked a significant milestone in capturing dependencies. Subsequent innovations introduced autoregressive modeling, which predicts current features based on causal contexts in spatial or channel dimensions, further enhancing probability estimation accuracy. Most recently, hierarchical autoregressive models have been developed to capture global and local contexts in a coarse-to-fine manner, pushing the boundaries of feature compactness and coding efficiency. In parallel, the optimization objectives have expanded beyond pixel-level fidelity metrics like MSE and MS-SSIM to include perceptual metrics and adversarial losses, allowing for a trade-off between signal distortion and perceptual quality. Beyond theoretical performance, the transition of learned coding from academic exploration to industrial application requires addressing practical dimensions such as variable rate control, hardware efficiency, and robustness. To support variable bitrates within a single model, researchers have developed mechanisms involving multi-scale decomposition and feature modulation, where quality factors or maps scale latent variables or intermediate features. Rate control algorithms have also advanced, utilizing iterative search strategies or deep modeling of the rate-parameter relationship to meet specific bandwidth constraints. To facilitate deployment on commodity hardware, model quantization techniques, including Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), are employed to convert floating-point models into fixed-point integer operations, ensuring cross-platform consistency and reducing computational overhead. Furthermore, addressing the vulnerability of neural networks to perturbations, robust coding frameworks are being designed to defend against adversarial attacks and transmission errors through training regularization and input preprocessing. These technical advancements have culminated in the integration of learned compression into formal international standards. Two landmark standards have recently emerged: JPEG AI, ratified by ITU-T as T.840.1, and IEEE 1857.11-2024. While both adopt VAE-based architectures, they differ in design philosophy. JPEG AI employs a multi-branch network design and emphasizes subjective quality and machine-task compatibility, optimizing for perception-oriented metrics like MS-SSIM and VMAF. In contrast, IEEE 1857.11 focuses on objective gains in PSNR and MS-SSIM, offering tiered complexity profiles (Base, Main, High) to adapt to different computational capabilities. Both standards have established rigorous training and evaluation protocols, including the use of specific datasets like Kodak and dedicated robustness benchmarks, to ensure fair comparison and reproducibility. The principles of learned image coding have naturally extended to video coding, although with unique challenges in temporal modeling. The evolution of neural video coding can be categorized into three developmental phases. The first phase involved hybrid approaches that replaced specific modules, such as intra prediction, with learned networks while retaining the traditional motion-compensated residual coding framework. The second phase moved toward conditional inter-frame coding, utilizing learned optical flow networks for motion estimation and warping to generate temporal contexts. The third and most recent phase represents a shift toward unified probabilistic frameworks that eliminate explicit motion estimation entirely. These systems leverage hierarchical spatial-temporal priors to perform joint intra and inter prediction within a single model, achieving performance that rivals or exceeds the latest H.266/VVC standard while approaching real-time processing speeds on GPUs. Looking forward, the field is converging toward two major trends: task-aware coding and generative integration. Task-aware coding aims to support both human vision and machine perception from a single bitstream, aligning with the biological principle of "compression as intelligence" where compact representations facilitate diverse downstream cognitive tasks. Simultaneously, the integration of generative models, such as diffusion models and large multimodal models, is enabling ultra-low-bitrate reconstruction with high semantic fidelity, fundamentally altering the rate-distortion-perception trade-off. This report synthesizes these technical, practical, and standardization advances to provide a comprehensive perspective. Ultimately, the future of intelligent compression lies in establishing a new foundation for multimodal, task-agnostic, and semantically aware visual communication.
Agustsson E , Mentzer F , Tschannen M , Cavigelli L , Timofte R , Benini L and Gool L V . 2017 . Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations// Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS) . 1141 - 1151
Agustsson E , Minnen D , Johnston N , Ballé J , Hwang S J and Toderici G . 2020 . Scale-Space Flow for End-to-End Optimized Video Compression // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle, WA, USA : IEEE: 8500 – 8509 [ DOI: 10.1109/CVPR42600.2020.00853 http://dx.doi.org/10.1109/CVPR42600.2020.00853 ]
Agustsson E , Minnen D , Toderici G and Mentzer F . 2023 . Multi-Realism Image Compression with a Conditional Generator/ /IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22324 – 22333 [ DOI: 10.1109/CVPR52729.2023.02138 http://dx.doi.org/10.1109/CVPR52729.2023.02138 ]
Agustsson E , Tschannen M , Mentzer F , Timofte R and Van Gool L . 2019 . Generative Adversarial Networks for Extreme Learned Image Compression/ /IEEE/CVF International Conference on Computer Vision (ICCV). 221 – 231 [ DOI: 10.1109/ICCV.2019.00031 http://dx.doi.org/10.1109/ICCV.2019.00031 ]
Ahmed N , Natarajan T R and Rao K R . 1974 . Discrete Cosine Transform . IEEE Transactions on Computers, C– 23 ( 1 ): 90 – 93 [ DOI: 10.1109/T-C.1974.223784 http://dx.doi.org/10.1109/T-C.1974.223784 ]
Akbari M , Liang J , Han J and Tu C . 2021 . Learned Multi-Resolution Variable-Rate Image Compression With Octave-Based Residual Blocks . IEEE Transactions on Multimedia , 23 : 3013 – 3021 [ DOI: 10.1109/TMM.2021.3068523 http://dx.doi.org/10.1109/TMM.2021.3068523 ]
Ascenso J , Alshina E and Ebrahimi T . 2023 . The JPEG AI Standard: Providing Efficient Human and Machine Visual Data Consumption . IEEE MultiMedia , 30 ( 1 ): 100 – 111 [ DOI: 10.1109/MMUL.2023.3245919 http://dx.doi.org/10.1109/MMUL.2023.3245919 ]
Ballé J , Chou P A , Minnen D , Singh S , Johnston N , Agustsson E , Hwang S J and Toderici G . 2021 . Nonlinear Transform Coding . IEEE Journal of Selected Topics in Signal Processing , 15 ( 2 ): 339 – 353 [ DOI: 10.1109/JSTSP.2020.3034501 http://dx.doi.org/10.1109/JSTSP.2020.3034501 ]
Ballé J , Johnston N and Minnen D . 2019 . Integer Networks for Data Compression with Latent-Variable Models // International Conference on Learning Representations (ICLR)
Ballé J , Laparra V and Simoncelli E P . 2016 . End-to-End Optimization of Nonlinear Transform Codes for Perceptual Quality // 2016 Picture Coding Symposium (PCS) . Nuremberg, Germany : IEEE: 1 – 5 [ DOI: 10.1109/PCS.2016.7906310 http://dx.doi.org/10.1109/PCS.2016.7906310 ]
Ballé J , Laparra V and Simoncelli E P . 2017 . End-to-End Optimized Image Compression // International Conference on Learning Representations (ICLR)
Ballé J , Minnen D , Singh S , Hwang S J and Johnston N . 2018 . Variational Image Compression with a Scale Hyperprior // International Conference on Learning Representations (ICLR)
Bińkowski M , Sutherland D J , Arbel M and Gretton A . 2018 . Demystifying MMD GANs // International Conference on Learning Representations (ICLR)
Blau Y and Michaeli T . 2019 . Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff // International Conference on Machine Learning (ICML) . PMLR: 675 – 685
Bross B , Chen J , Ohm J-R , Sullivan G J and Wang Y-K . 2021 . Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC) . Proceedings of the IEEE , 109 ( 9 ): 1463 – 1493 [ DOI: 10.1109/JPROC.2020.3043399 http://dx.doi.org/10.1109/JPROC.2020.3043399 ]
Cai C , Chen L , Zhang X and Gao Z . 2019 . Efficient Variable Rate Image Compression With Multi-Scale Decomposition Network . IEEE Transactions on Circuits and Systems for Video Technology , 29 ( 12 ): 3687 – 3700 [ DOI: 10.1109/TCSVT.2018.2880492 http://dx.doi.org/10.1109/TCSVT.2018.2880492 ]
Cai S , Chen L , Zhang Z , Zhao X , Zhou J , Peng Y , Yan L , Zhong S and Zou X . 2024 . I2C: Invertible Continuous Codec for High-Fidelity Variable-Rate Image Compression . IEEE Transactions on Pattern Analysis and Machine Intelligence , 46 ( 6 ): 4262 – 4279 [ DOI: 10.1109/TPAMI.2024.3356557 http://dx.doi.org/10.1109/TPAMI.2024.3356557 ]
Careil M , Muckley M J , Verbeek J and Lathuilière S . 2024 . Towards Image Compression with Perfect Realism at Ultra-Low Bitrates // International Conference on Learning Representations (ICLR)
Chen T , Liu H , Ma Z , Shen Q , Cao X and Wang Y . 2021 . End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling . IEEE Transactions on Image Processing , 30 : 3179 – 3191 [ DOI: 10.1109/TIP.2021.3058615 http://dx.doi.org/10.1109/TIP.2021.3058615 ]
Chen T , Liu H , Shen Q , Yue T , Cao X and Ma Z . 2017 . DeepCoder: A Deep Neural Network Based Video Compression //IEEE Visual Communications and Image Processing (VCIP). St . Petersburg, FL : IEEE: 1 – 4 [ DOI: 10.1109/VCIP.2017.8305033 http://dx.doi.org/10.1109/VCIP.2017.8305033 ]
Chen T and Ma Z . 2020 . Variable Bitrate Image Compression with Quality Scaling Factors // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Barcelona, Spain : IEEE: 2163 – 2167 [ DOI: 10.1109/ICASSP40776.2020.9053885 http://dx.doi.org/10.1109/ICASSP40776.2020.9053885 ]
Chen T and Ma Z . 2023 . Toward Robust Neural Image Compression: Adversarial Attack and Model Finetuning . IEEE Transactions on Circuits and Systems for Video Technology , 33 ( 12 ): 7842 – 7856 [ DOI: 10.1109/TCSVT.2023.3276442 http://dx.doi.org/10.1109/TCSVT.2023.3276442 ]
Cheng Z , Sun H , Takeuchi M and Katto J . 2020 . Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle, WA, USA : IEEE: 7936 - 7945 [ DOI: 10.1109/CVPR42600.2020.00796 http://dx.doi.org/10.1109/CVPR42600.2020.00796 ]
Choi Y , El-Khamy M and Lee J . 2019 . Variable Rate Deep Image Compression With a Conditional Autoencoder // IEEE/CVF International Conference on Computer Vision (ICCV) . Seoul, Korea (South) : IEEE: 3146 – 3154 [ DOI: 10.1109/ICCV.2019.00324 http://dx.doi.org/10.1109/ICCV.2019.00324 ]
Chua L O and Lin T . 1988 . A Neural Network Approach to Transform Image Coding . International Journal of Circuit Theory and Applications , 16 ( 3 ): 317 – 324 [ DOI: 10.1002/cta.4490160308 http://dx.doi.org/10.1002/cta.4490160308 ]
Cui Z , Wang J , Gao S , Guo T , Feng Y and Bai B . 2021 . Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, TN, USA : IEEE: 10527 – 10536 [ DOI: 10.1109/CVPR46437.2021.01039 http://dx.doi.org/10.1109/CVPR46437.2021.01039 ]
Ding D , Ma Z , Chen D , Chen Q , Liu Z and Zhu F . 2021 . Advances in Video Compression System Using Deep Neural Network: A Review and Case Studies . Proceedings of the IEEE , 109 ( 9 ): 1494 – 1520 [ DOI: 10.1109/JPROC.2021.3095701 http://dx.doi.org/10.1109/JPROC.2021.3095701 ]
Dong M , Lu M and Ma Z . 2024 . Accelerating Block-Level Rate Control for Learned Image Compression/ /Data Compression Conference (DCC). 552 – 552 [ DOI: 10.1109/DCC58796.2024.00069 http://dx.doi.org/10.1109/DCC58796.2024.00069 ]
Dosovitskiy A and Djolonga J . 2020 . You Only Train Once: Loss-Conditional Training of Deep Networks // International Conference on Learning Representations (ICLR)
Duan Z , Lu M , Ma J , Huang Y , Ma Z and Zhu F . 2024 . QARV: Quantization-Aware ResNet VAE for Lossy Image Compression . IEEE Transactions on Pattern Analysis and Machine Intelligence , 46 ( 1 ): 436 – 450 [ DOI: 10.1109/TPAMI.2023.3322904 http://dx.doi.org/10.1109/TPAMI.2023.3322904 ]
Duan Z , Lu M , Ma Z and Zhu F . 2023 . Lossy Image Compression with Quantized Hierarchical Vaes // Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . IEEE : 198 – 207 [ DOI: 10.1109/WACV56688.2023.00025 http://dx.doi.org/10.1109/WACV56688.2023.00025 ]
Duan Z , Ma Z and Zhu F . 2023 . Unified Architecture Adaptation for Compressed Domain Semantic Inference . IEEE Transactions on Circuits and Systems for Video Technology , 33 ( 8 ): 4108 – 4121 [ DOI: 10.1109/TCSVT.2023.3240391 http://dx.doi.org/10.1109/TCSVT.2023.3240391 ]
Dumas T , Roumy A and Guillemot C . 2018 . Autoencoder Based Image Compression: Can the Learning Be Quantization Independent? // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Calgary, AB, Canada : IEEE: 1188 – 1192 [ DOI: 10.1109/ICASSP.2018.8462263 http://dx.doi.org/10.1109/ICASSP.2018.8462263 ]
El-Nouby A , Muckley M J , Ullrich K , Laptev I , Verbeek J and Jegou H . 2023 . Image Compression with Product Quantized Masked Image Modeling . Transactions on Machine Learning Research (TMLR)
Esser P , Rombach R and Ommer B . 2021 . Taming Transformers for High-Resolution Image Synthesis/ /Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12873 – 12883 [ DOI: 10.1109/CVPR46437.2021.01268 http://dx.doi.org/10.1109/CVPR46437.2021.01268 ]
Faisal Hossain M . A , Duan Z and Zhu F . 2024 . Flexible Mixed Precision Quantization for Learned Image Compression // 2024 IEEE International Conference on Multimedia and Expo (ICME) . Niagara Falls, ON, Canada : IEEE: 1 – 8 [ DOI: 10.1109/ICME57554.2024.10687695 http://dx.doi.org/10.1109/ICME57554.2024.10687695 ]
Fang L , Jia W , Lin J , Tan M , Wang Y , Wu Q and Han X . 2025 . Introduction to Vision and Multimodal Large Models . Journal of Image and Graphics , 30 ( 5 ): 1195 - 1196
方乐缘 , 贾伟 , 林倞 , 谭明奎 , 王耀威 , 吴庆耀 , 韩向娣 . 2025 . 《中国图象图形学报》视觉及多模态大模型专栏简介 . 中国图象图形学报 , 30 ( 5 ): 1195 - 1196 [ DOI: 10.11834/jig.2500005 http://dx.doi.org/10.11834/jig.2500005 ]
Gao Y , Wu Y , Guo Z , Zhang Z and Chen Z . 2021 . Perceptual Friendly Variable Rate Image Compression/ /2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1916 – 1920 [ DOI: 10.1109/CVPRW53098.2021.00217 http://dx.doi.org/10.1109/CVPRW53098.2021.00217 ]
Ge Z , Ma S , Gao W , Pan J and Jia C . 2024 . NLIC: Non-Uniform Quantization-Based Learned Image Compression . IEEE Transactions on Circuits and Systems for Video Technology , 34 ( 10 ): 9647 – 9663 [ DOI: 10.1109/TCSVT.2024.3401872 http://dx.doi.org/10.1109/TCSVT.2024.3401872 ]
Goyal V K . 2001 . Theoretical Foundations of Transform Coding . IEEE Signal Processing Magazine , 18 ( 5 ): 9 – 21 [ DOI: 10.1109/79.952802 http://dx.doi.org/10.1109/79.952802 ]
Zamir R and Feder M . 1992 . On Universal Quantization by Randomized Uniform/Lattice Quantizers . IEEE Transactions on Information Theory , 38 ( 2 ): 428 – 436 [ DOI: 10.1109/18.119699 http://dx.doi.org/10.1109/18.119699 ]
Habibian A , Rozendaal T van , Tomczak J M and Cohen T S . 2019 . Video Compression With Rate-Distortion Autoencoders/ /IEEE/CVF International Conference on Computer Vision (ICCV). 7032 – 7041 [ DOI: 10.1109/ICCV.2019.00713 http://dx.doi.org/10.1109/ICCV.2019.00713 ]
Han J , Li B , Mukherjee D , Chiang C-H , Grange A , Chen C , Su H , Parker S , Deng S , Joshi U , Chen Y , Wang Y , Wilkins P , Xu Y and Bankoski J . 2021 . A Technical Overview of AV1 . Proceedings of the IEEE , 109 ( 9 ): 1435 – 1462 [ DOI: 10.1109/JPROC.2021.3058584 http://dx.doi.org/10.1109/JPROC.2021.3058584 ]
Hannuksela M M , Aksu E B , Vadakital V K M and Lainema J . 2015 . Overview of the High Efficiency Image File Format. JCTVC-V0072 . Joint Collaborative Team on Video Coding (JCT-VC)
He D , Yang Z , Chen Y , Zhang Q , Qin H and Wang Y . 2022 . Post-Training Quantization for Cross-Platform Learned Image Compression [EB/OL].[ 2022-11-30 ]. http://arxiv.org/abs/2202.07513.pdf http://arxiv.org/abs/2202.07513.pdf
He D , Yang Z , Peng W , Ma R , Qin H and Wang Y . 2022 . ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . New Orleans, LA, USA : IEEE: 5708 – 5717 [ DOI: 10.1109/CVPR52688.2022.00563 http://dx.doi.org/10.1109/CVPR52688.2022.00563 ]
He D , Zheng Y , Sun B , Wang Y and Qin H . 2021 . Checkerboard Context Model for Efficient Learned Image Compression // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, TN, USA : IEEE: 14766 – 14775 [ DOI: 10.1109/CVPR46437.2021.01453 http://dx.doi.org/10.1109/CVPR46437.2021.01453 ]
Heusel M , Ramsauer H , Unterthiner T , Nessler B and Hochreiter S . 2017 . GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook, NY, USA : Curran Associates Inc.: 6629 – 6640
Hinton Geoffrey . 2012 . Neural Networks for Machine Learning [EB/OL]. University of Toronto : Coursera lecture series; lecturesvideo.
Ho Y . -H, Chan C . -C , Peng W.-H, Hang H.-M. and Domański M. 2021 . ANFIC: Image Compression Using Augmented Normalizing Flows. IEEE Open Journal of Circuits and Systems, 2 : 613 – 626 [ DOI: 10.1109/OJCAS.2021.3123201 http://dx.doi.org/10.1109/OJCAS.2021.3123201 ]
Hong W , Chen T , Lu M , Pu S and Ma Z . 2021 . Efficient Neural Image Decoding via Fixed-Point Inference . IEEE Transactions on Circuits and Systems for Video Technology , 31 ( 9 ): 3618 – 3630 [ DOI: 10.1109/TCSVT.2020.3040367 http://dx.doi.org/10.1109/TCSVT.2020.3040367 ]
Huang C , Liu H , Chen T , Shen Q and Ma Z . 2019 . Extreme Image Coding via Multiscale Autoencoders with Generative Adversarial Optimization/ /2019 IEEE Visual Communications and Image Processing (VCIP). 1 – 4 [ DOI: 10.1109/VCIP47243.2019.8966059 http://dx.doi.org/10.1109/VCIP47243.2019.8966059 ]
Hu Z , Lu G and Xu D . 2021 . FVC: A New Framework towards Deep Video Compression in Feature Space // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, TN, USA : IEEE: 1502 – 1511 [ DOI: 10.1109/CVPR46437.2021.00155 http://dx.doi.org/10.1109/CVPR46437.2021.00155 ]
Huang Y , Zhang J , Shan Z and He J . 2024 . Compression Represents Intelligence Linearly // First Conference on Language Modeling (COLM) . University of Pennsylvania , Philadelphia, PA .
Hudson G , Léger A , Niss B , Sebestyén I. and Vaaben J . 2018 . JPEG-1 Standard 25 Years: Past, Present, and Future Reasons for a Success . Journal of Electronic Imaging , 27 ( 4 ): 040901 [ DOI: 10.1117/1.JEI.27.4.040901 http://dx.doi.org/10.1117/1.JEI.27.4.040901 ]
Huffman D . A . 1952 . A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE , 40 ( 9 ): 1098 – 1101 [ DOI: 10.1109/JRPROC.1952.273898 http://dx.doi.org/10.1109/JRPROC.1952.273898 ]
IEEE Std 1857 . 11 - 2024 . 2024 . IEEE Standard for Neural Network–Based Image Coding .
Jia P , Brand F , Yu D , Karabutov A , Alshina E. and Kaup A . 2025 . Overview of Variable Rate Coding in JPEG AI . IEEE Transactions on Circuits and Systems for Video Technology , 35 ( 9 ): 9460 – 9474 [ DOI: 10.1109/TCSVT.2025.3552971 http://dx.doi.org/10.1109/TCSVT.2025.3552971 ]
Jia C , Liu Z , Wang Y , Ma S. and Gao W . 2019 . Layered Image Compression Using Scalable Auto-Encoder // 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) . San Jose, CA, USA : IEEE: 431 – 436 [ DOI: 10.1109/MIPR.2019.00087 http://dx.doi.org/10.1109/MIPR.2019.00087 ]
Jia X , Wei X , Cao X. and Foroosh H . 2019 . ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA : IEEE: 6077 – 6085 [ DOI: 10.1109/CVPR.2019.00624 http://dx.doi.org/10.1109/CVPR.2019.00624 ]
Jia Z , Li B , Li J , Xie W , Qi L , Li H. and Lu Y . 2025 . Towards Practical Real-Time Neural Video Compression // 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE : 12543 – 12552 [ DOI: 10.1109/CVPR52734.2025.01170 http://dx.doi.org/10.1109/CVPR52734.2025.01170 ]
Jia Z , Li J , Li B , Li H. and Lu Y . 2024 . Generative Latent Coding for Ultra-Low Bitrate Image Compression // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle, WA, USA : IEEE: 26088 – 26098 [ DOI: 10.1109/CVPR52733.2024.02465 http://dx.doi.org/10.1109/CVPR52733.2024.02465 ]
Jia C , Zhao Z , Wang T , and Ma S . 2024 . Neural Network–Based Image and Video Coding [J]. Telecommunication Science , 35 ( 5 ): 32 – 42 .
贾川民 , 赵政辉 , 王苫社 , 马思伟 . 2024 . 基于神经网络的图像视频编码 [J]. 电信科学 , 35 ( 5 ): 32 – 42 .
Ke A , Zhang X , Chen T , Lu M , Zhou C , Gu J and Ma Z . 2025 . Ultra Low-rate Image Compression with Semantic Residual Coding and Compression-aware Diffusion // International Conference on Machine Learning (ICML) . Vancouver, Canada .
Kim J . -H , Jang S, Choi J. -H . and Lee J.-S. 2020 . Instability of Successive Deep Image Compression//Proceedings of the 28th ACM International Conference on Multimedia (ACM MM). Seattle, WA, USA : ACM: 247 – 255 [ DOI: 10.1145/3394171.3413680 http://dx.doi.org/10.1145/3394171.3413680 ]
Kovalev E , Bychkov G , Abud K , Gushchin A , Chistyakova A , Lavrushkin S , Vatolin D. and Antsiferova A . 2024 . Exploring Adversarial Robustness of JPEG AI: Methodology, Comparison and New Methods [EB/OL].[ 2024-11-18 ]. https://arxiv.org/abs/2411.11795 https://arxiv.org/abs/2411.11795
Ladune T , Philippe P , Hamidouche W , Zhang L. and Déforges O . 2020 . Optical Flow and Mode Selection for Learning-based Video Coding // 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) . Tampere, Finland : IEEE: 1 – 6 [ DOI: 10.1109/MMSP48831.2020.9287049 http://dx.doi.org/10.1109/MMSP48831.2020.9287049 ]
Laparra V , Ballé J , Berardino A. and Simoncelli E . P . 2016 . Perceptual Image Quality Assessment Using a Normalized Laplacian Pyramid. Electronic Imaging , 28 : 1 – 6 [ DOI: 10.2352/ISSN.2470-1173.2016.16.HVEI-103 http://dx.doi.org/10.2352/ISSN.2470-1173.2016.16.HVEI-103 ]
Lee J , Jeong S. and Kim M . 2022 . Selective Compression Learning of Latent Representations for Variable-Rate Image Compression // Advances in Neural Information Processing Systems (NeurIPS) , 35 : 13146 – 13157
Li J , Li B. and Lu Y . 2021 . Deep Contextual Video Compression // Advances in Neural Information Processing Systems (NeurIPS) , 34 : 18114 – 18125
Li J , Li B. and Lu Y . 2022 . Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression // Proceedings of the 30th ACM International Conference on Multimedia (ACM MM) . Lisbon, Portugal : 1503 – 1511 [ DOI: 10.1145/3503161.3547845 http://dx.doi.org/10.1145/3503161.3547845 ]
Li J , Li B. and Lu Y . 2023 . Neural Video Compression with Diverse Contexts // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Vancouver, British Columbia, Canada : IEEE: 22616 – 22626 [ DOI: 10.1109/CVPR52729.2023.02166 http://dx.doi.org/10.1109/CVPR52729.2023.02166 ]
Li J , Li B. and Lu Y . 2024 . Neural Video Compression with Feature Modulation [C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle, WA, USA : IEEE: 26099 – 26108 [ DOI: 10.1109/CVPR52733.2024.02466 http://dx.doi.org/10.1109/CVPR52733.2024.02466 .]
Li S , Dai W , Kan N , Li C , Zou J. and Xiong H . 2025 . Learnable Non-Uniform Quantization With Sampling-Based Optimization for Variable-Rate Learned Image Compression . IEEE Transactions on Circuits and Systems for Video Technology , 35 ( 8 ): 8314 – 8329 [ DOI: 10.1109/TCSVT.2025.3546765 http://dx.doi.org/10.1109/TCSVT.2025.3546765 ]
Li Y , Zhang H , Li L. and Liu D . 2025 . Learned Image Compression with Hierarchical Progressive Context Modeling // Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) . Music City Center, Nashville, TN, USA : IEEE: 18834 – 18843
Li Z , Aaron A , Katsavounidis I , Moorthy A and Manohara M . 2016 . Toward A Practical Perceptual Video Quality Metric . Netflix TechBlog
Li Z , Zhou Y , Wei H , Ge C and Jiang J . 2025 . Toward Extreme Image Compression with Latent Feature Guidance and Diffusion Prior . IEEE Transactions on Circuits and Systems for Video Technology , 35 ( 1 ): 888 – 899 [ DOI: 10.1109/TCSVT.2024.3455576 http://dx.doi.org/10.1109/TCSVT.2024.3455576 ]
Liu D , Li Y , Lin J , Li H. and Wu F . 2021 . Deep Learning-Based Video Coding: A Review and a Case Study . ACM Computing Surveys (CSUR) , 53 ( 1 ): 1 – 35 [ DOI: 10.1145/3368405 http://dx.doi.org/10.1145/3368405 ]
Liu H , Shen H , Huang L , Lu M , Chen T and Ma Z . 2020 . Learned Video Compression via Joint Spatial-Temporal Correlation Exploration . Proceedings of the AAAI Conference on Artificial Intelligence , 34 ( 07 ): 11580 – 11587 [ DOI: 10.1609/aaai.v34i07.6825 http://dx.doi.org/10.1609/aaai.v34i07.6825 ]
Liu H , Lu M , Ma Z , Wang F , Xie Z , Cao X and Wang Y . 2021 . Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model . IEEE Transactions on Circuits and Systems for Video Technology , 31 ( 8 ): 3182 – 3196 [ DOI: 10.1109/TCSVT.2020.3035680 http://dx.doi.org/10.1109/TCSVT.2020.3035680 ]
Liu J , Liu D , Yang W , Xia S , Zhang X. and Dai Y . 2020 . A Comprehensive Benchmark for Single Image Compression Artifact Reduction . IEEE Transactions on Image Processing , 29 : 7845 – 7860 [ DOI: 10.1109/TIP.2020.3007828 http://dx.doi.org/10.1109/TIP.2020.3007828 ]
Liu J , Wang S , Ma W . C , Shah M , Hu R , Dhawan P , and Urtasun R . 2020 . Conditional Entropy Coding for Efficient Video Compression [C]// European Conference on Computer Vision (ECCV) . Cham : Springer International Publishing: 453 – 468 [ DOI: https://doi.org/10.1007/978-3-030-58520-4_27 http://dx.doi.org/https://doi.org/10.1007/978-3-030-58520-4_27 ]
Liu J , Sun H and Katto J . 2023 . Learned Image Compression with Mixed Transformer-CNN Architectures/ /2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14388 – 14397 [ DOI: 10.1109/CVPR52729.2023.01383 http://dx.doi.org/10.1109/CVPR52729.2023.01383 ]
Liu K , Wu D , Wu Y , Wang Y , Feng D , Tan B and Garg S . 2024 . Manipulation Attacks on Learned Image Compression . IEEE Transactions on Artificial Intelligence , 5 ( 6 ): 3083 – 3097 [ DOI: 10.1109/TAI.2023.3340982 http://dx.doi.org/10.1109/TAI.2023.3340982 ]
Lu G , Ouyang W , Xu D , Zhang X , Cai C. and Gao Z . 2019 . DVC: An End-To-End Deep Video Compression Framework // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach, CA, USA : IEEE: 10998 – 11007 [ DO I: 10.1109/CVPR.2019.01126 http://dx.doi.org/10.1109/CVPR.2019.01126 ]
Lu M , Chen F , Pu S. and Ma Z . 2022 . High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation [EB/OL].[ 2022-10-12 ]. http://arxiv.org/abs/2204.11448 http://arxiv.org/abs/2204.11448
Lu M , Duan Z , Zhu F. and Ma Z . 2024 . Deep Hierarchical Video Compression . Proceedings of the AAAI Conference on Artificial Intelligence , 38 ( 8 ): 8859 – 8867 [ DOI: 10.1609/aaai.v38i8.28733 http://dx.doi.org/10.1609/aaai.v38i8.28733 ]
Lu M , Guo P , Shi H , Cao C and Ma Z . 2021 . Transformer-Based Image Compression . 2022 Data Compression Conference (DCC) , 469 – 469 [ DOI: 10.1109/DCC52660.2022.00080 http://dx.doi.org/10.1109/DCC52660.2022.00080 ]
Lu X , Wang H , Dong W , Wu F , Zheng Z. and Shi G . 2019 . Learning a Deep Vector Quantization Network for Image Compression . IEEE Access , 7 : 118815 – 118825 [ DOI: 10.1109/ACCESS.2019.2934731 http://dx.doi.org/10.1109/ACCESS.2019.2934731 ]
Ma S , Gao W , Yuan L , Lu Y . 2004 . A Rate-Control Algorithm for H.264/AVC . Journal of Electronics (Dianzi Xuebao) , 32 ( 12 ): 2024 – 2027 .
马思伟 , 高文 , 袁禄军 , 等 . 2004 . 一种面向 H .264/AVC 的码率控制算法[J]. 电子学报 , 32 ( 12 ): 2024 – 2027 .
Ma S , Jia C , Zhao Z , and Wang S . 2020 . Intelligent Video Coding [J]. Artificial Intelligence , 2020( 2 ): 20 – 28 .
马思伟 , 贾川民 , 赵政辉 , 等 . 2020 . 智能视频编码 [J]. 人工智能 , 2020( 2 ): 20 – 28 .
Ma H , Liu D , Yan N , Li H. and Wu F . 2022 . End-to-End Optimized Versatile Image Compression With Wavelet-Like Transform . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 3 ): 1247 – 1263 [ DOI: 10.1109/TPAMI.2020.3026003 http://dx.doi.org/10.1109/TPAMI.2020.3026003 ]
Ma S , Zhang X , Jia C , Zhao Z , Wang S. and Wang S . 2020 . Image and Video Compression With Neural Networks: A Review . IEEE Transactions on Circuits and Systems for Video Technology , 30 ( 6 ): 1683 – 1698 [ DOI: 10.1109/TCSVT.2019.2910119 http://dx.doi.org/10.1109/TCSVT.2019.2910119 ]
Marcellin M . W, Gormish M. J, Bilgin A. and Boliek M . P. 2000 . An Overview of JPEG-2000. Data Compression Conference (DCC) , Snowbird, UT, USA : 523 – 541 [ DOI: 10.1109/DCC.2000.838192 http://dx.doi.org/10.1109/DCC.2000.838192 ]
Madry A , Makelov A , Schmidt L , Tsipras D and Vladu A . 2018 . Towards Deep Learning Models Resistant to Adversarial Attacks // International Conference on Learning Representations (ICLR)
Mentzer F , Toderici G , Minnen D , Hwang S-J , Caelles S , Lucic M and Agustsson E . 2022 . VCT: A Video Compression Transformer // Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS) , 35 : 13091 – 13103
Mentzer F , Toderici G , Tschannen M and Agustsson E . 2020 . High-fidelity Generative Image Compression // Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS) , 33 : 11913 – 11924
Minnen D , Ballé J and Toderici G . 2018 . Joint Autoregressive and Hierarchical Priors for Learned Image Compression // Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS) . Red Hook, NY, USA : Curran Associates Inc.: 10794 – 10803
Minnen D . and Singh S . 2020 . Channel-Wise Autoregressive Entropy Models for Learned Image Compression. 2020 IEEE International Conference on Image Processing (ICIP) , Abu Dhabi, United Arab Emirates: 3339 – 3343 [ DOI: 10.1109/ICIP40778.2020.9190935 http://dx.doi.org/10.1109/ICIP40778.2020.9190935 ]
Mittal A , Soundararajan R and Bovik A C . 2013 . Making a “Completely Blind” Image Quality Analyzer . IEEE Signal Processing Letters , 20 ( 3 ): 209 – 212 [ DOI: 10.1109/LSP.2012.2227726 http://dx.doi.org/10.1109/LSP.2012.2227726 ]
Muckley M . J , El-Nouby A , Ullrich K , Jégou H. and Verbeek J . 2023 . Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models // International Conference on Machine Learning (ICML) . PMLR: 25426 – 25443
Nagel M , Fournarakis M , Amjad R , Bondarenko Y , Baalen M and Blankevoort T . 2021 . A White Paper on Neural Network Quantization [EB/OL].[ 2021-06-15 ]. http://arxiv.org/abs/2106.08295 http://arxiv.org/abs/2106.08295
Nakanishi K M , Maeda S , Miyato T and Okanohara D . 2019 . Neural Multi-Scale Image Compression // Asian Conference on Computer Vision (ACCV) . Cham : Springer International Publishing: 718 – 732 [ DOI: 10.1007/978-3-030-20876-9_45 http://dx.doi.org/10.1007/978-3-030-20876-9_45 ]
Ortega A . and Ramchandran K . 1998 . Rate-Distortion Methods for Image and Video Compression. IEEE Signal Processing Magazine , 15 ( 6 ): 23 – 50 [ DOI: 10.1109/79.733495 http://dx.doi.org/10.1109/79.733495 ]
Watson A . B . 1998 . Toward a Perceptual Video-Quality Metric [C]// Human Vision and Electronic Imaging III . SPIE : 139 – 147 [ DOI: 10.1117/12.320105 http://dx.doi.org/10.1117/12.320105 .]
Pan X , Ding G , Chen Z and Chen C . 2025 . An Efficient Neural Rate Control for JPEG-AI . IEEE Transactions on Circuits and Systems for Video Technology , 1 – 1 [ DOI: 10.1109/TCSVT.2025.3614007 http://dx.doi.org/10.1109/TCSVT.2025.3614007 ]
Ponomarenko N , Silvestri F , Egiazarian K , Carli M , Astola J and Lukin V . 2007 . On Between-Coefficient Contrast Masking of DCT Basis Functions // Proceedings of the Third International Workshop on Video Processing and Quality Metrics . Scottsdale USA
Presta A , Tartaglione E , Fiandrotti A and Grangetto M . 2024 . STanH : Parametric Quantization for Variable Rate Learned Image Compression . IEEE Transactions on Image Processing , 34 : 639 - 651 [ DOI: 10.1109/TIP.2025.3527883 http://dx.doi.org/10.1109/TIP.2025.3527883 ]
Ranjan A and Black M J . 2017 . Optical Flow Estimation Using a Spatial Pyramid Network // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu, HI, USA : IEEE: 2720 - 2729 [ DOI: 10.1109/CVPR.2017.291 http://dx.doi.org/10.1109/CVPR.2017.291 ]
Rippel O and Bourdev L . 2017 . Real-Time Adaptive Image Compression // Proceedings of the 34th International Conference on Machine Learning . Sydney, NSW, Australia : PMLR: 70 : 2922 - 2930
Rumelhart D E , Hinton G E and Williams R J . 1986 . Learning Representations by Back-Propagating Errors . Nature , 323 ( 6088 ): 533 – 536 [ DOI: 10.1038/323533a0 http://dx.doi.org/10.1038/323533a0 ]
Sha H , Dong M , Luo Q , Lu M , Chen H and Ma Z . 2025 . Towards Loss-Resilient Image Coding for Unstable Satellite Networks . Proceedings of the AAAI Conference on Artificial Intelligence , 39 ( 12 ): 12506 - 12514 [ DOI: 10.1609/aaai.v39i12.33363 http://dx.doi.org/10.1609/aaai.v39i12.33363 ]
Sheikh H R and Bovik A C . 2006 . Image Information and Visual Quality . IEEE Transactions on Image Processing , 15 ( 2 ): 430 – 444 [ DOI: 10.1109/TIP.2005.859378 http://dx.doi.org/10.1109/TIP.2005.859378 ]
Sheng X , Li J , Li B , Li L , Liu D and Lu Y . 2023 . Temporal Context Mining for Learned Video Compression . IEEE Transactions on Multimedia , 25 : 7311 – 7322 [ DOI: 10.1109/TMM.2022.3220421 http://dx.doi.org/10.1109/TMM.2022.3220421 ]
Shi J , Lu M and Ma Z . 2023 . Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression . IEEE Transactions on Circuits and Systems for Video Technology , 34 ( 5 ): 3082 - 3095 [ DOI: 10.1109/TCSVT.2023.3323015 http://dx.doi.org/10.1109/TCSVT.2023.3323015 ]
Song M , Choi J and Han B . 2021 . Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform // IEEE/CVF International Conference on Computer Vision (ICCV) . Montreal, QC, Canada : IEEE: 2360 – 2369 [ DOI: 10.1109/ICCV48922.2021.00238 http://dx.doi.org/10.1109/ICCV48922.2021.00238 ]
Spadaro G , Presta A , Tartaglione E , Giraldo J H , Grangetto M and Fiandrotti A . 2024 . Gabic: Graph-Based Attention Block for Image Compression // IEEE International Conference on Image Processing (ICIP) . IEEE : 1802 – 1808 [ DOI: 10.1109/ICIP51287.2024.10647413 http://dx.doi.org/10.1109/ICIP51287.2024.10647413 ]
Su R , Cheng Z , Sun H and Katto J . 2020 . Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior // 2020 IEEE International Conference on Image Processing (ICIP) . Abu Dhabi, United Arab Emirates : IEEE: 3369 – 3373 [ DOI: 10.1109/ICIP40778.2020.9190704 http://dx.doi.org/10.1109/ICIP40778.2020.9190704 ]
Sullivan G J and Wiegand T . 2005 . Video Compression—From Concepts to the H.264/AVC Standard . Proceedings of the IEEE , 93 ( 1 ): 18 – 31 [ DOI: 10.1109/JPROC.2004.839612 http://dx.doi.org/10.1109/JPROC.2004.839612 ]
Sun H , Cheng Z , Takeuchi M and Katto J . 2020 . End-To-End Learned Image Compression With Fixed Point Weight Quantization // 2020 IEEE International Conference on Image Processing (ICIP) . Abu Dhabi, United Arab Emirates : IEEE: 3359 – 3363 [ DOI: 10.1109/ICIP40778.2020.9190805 http://dx.doi.org/10.1109/ICIP40778.2020.9190805 ]
Sun H , Yu L and Katto J . 2021 . Learned Image Compression with Fixed-Point Arithmetic // 2021 Picture Coding Symposium (PCS) . Bristol, United Kingdom : IEEE: 1 – 5 [ DOI: 10.1109/PCS50896.2021.9477496 http://dx.doi.org/10.1109/PCS50896.2021.9477496 ]
Sun H , Yu L and Katto J . 2025 . Q-LIC: Quantizing Learned Image Compression With Channel Splitting . IEEE Transactions on Circuits and Systems for Video Technology , 35 ( 4 ): 3798 – 3811 [ DOI: 10.1109/TCSVT.2022.3231789 http://dx.doi.org/10.1109/TCSVT.2022.3231789 ]
Tang Z , Wang H , Yi X , Zhang Y , Kwong S and Kuo C-C J . 2022 . Joint Graph Attention and Asymmetric Convolutional Neural Network for Deep Image Compression . IEEE Transactions on Circuits and Systems for Video Technology , 33 ( 1 ): 421 – 433 [ DOI: 10.1109/TCSVT.2022.3187654 http://dx.doi.org/10.1109/TCSVT.2022.3187654 ]
Theis L , Shi W , Cunningham A and Huszár F . 2017 . Lossy Image Compression with Compressive Autoencoders // 5th International Conference on Learning Representations . Toulon, France
Toderici G , O’Malley S M , Hwang S J , Vincent D , Minnen D , Baluja S , Covell M and Sukthankar R . 2016 . Variable Rate Image Compression with Recurrent Neural Networks // 4th International Conference on Learning Representations . San Juan, Puerto Rico
Toderici G , Vincent D , Johnston N , Hwang S J , Minnen D , Shor J and Covell M . 2017 . Full Resolution Image Compression with Recurrent Neural Networks // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu, HI : IEEE: 5435 – 5443 [ DOI: 10.1109/CVPR.2017.577 http://dx.doi.org/10.1109/CVPR.2017.577 ]
Tong K , Wu Y , Li Y , Zhang K , Zhang L and Jin X . 2023 . QVRF: A Quantization-Error-Aware Variable Rate Framework for Learned Image Compression // 2023 IEEE International Conference on Image Processing (ICIP) . Kuala Lumpur, Malaysia : IEEE: 1310 – 1314 [ DOI: 10.1109/ICIP49359.2023.10222717 http://dx.doi.org/10.1109/ICIP49359.2023.10222717 ]
Van Den Oord A and Vinyals O . 2017 . Neural Discrete Representation Learning // Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS) . Red Hook, NY, USA : Curran Associates Inc: 6309 – 6318
Wang L , Jiang Y , Wang T , Wang X and Dang J . 2025 . Information Disentanglement-Based Self-Supervised Learning Speech Pretrained Large Model . Journal of Image and Graphics , 30 ( 5 ): 1272 - 1285
王龙标 , 江宇 , 王天锐 , 王晓宝 , 党建武 . 2025 . 信息解耦式自监督预训练语音大模型 . 中国图象图形学报 , 30 ( 5 ): 1272 - 1285 [ DOI: 10.11834/jig.240607 http://dx.doi.org/10.11834/jig.240607 ]
Wang X , Zhang C , Ren W , Fu X , Zhou T , Zhao F and Shi Z . 2025 . Introduction to Visual State Space Models and Applications . Journal of Image and Graphics , 30 ( 10 ): 3171 - 3172
王兴刚 , 张长青 , 任文琦 , 傅雪阳 , 周涛 , 赵峰 , 石争浩 , 陈秀妍 . 2025 . 《中国图象图形学报》视觉状态空间模型及应用专栏简介 . 中国图象图形学报 , 30 ( 10 ): 3171 - 3172 [ DOI: 10.11834/jig.2500010 http://dx.doi.org/10.11834/jig.2500010 ]
Wang X , Lu M and Ma Z . 2022 . Block-Level Rate Control for Learnt Image Coding // 2022 Picture Coding Symposium (PCS) . San Jose, CA, USA : IEEE: 157 – 161 [ DOI: 10.1109/PCS56426.2022.10018043 http://dx.doi.org/10.1109/PCS56426.2022.10018043 ]
Wang Z , Bovik A C , Sheikh H R and Simoncelli E P . 2004 . Image Quality Assessment: From Error Visibility to Structural Similarity . IEEE Transactions on Image Processing , 13 ( 4 ): 600 – 612 [ DOI: 10.1109/TIP.2003.819861 http://dx.doi.org/10.1109/TIP.2003.819861 ]
Wang Z and Li Q . 2010 . Information Content Weighting for Perceptual Image Quality Assessment . IEEE Transactions on Image Processing , 20 ( 5 ): 1185 – 1198 [ DOI: 10.1109/TIP.2010.2092435 http://dx.doi.org/10.1109/TIP.2010.2092435 ]
Wang Z , Simoncelli E P and Bovik A C . 2003 . Multiscale Structural Similarity for Image Quality Assessment // The Thrity-Seventh Asilomar Conference on Signals , Systems & Computers, 2003. Pacific Grove, CA, USA : IEEE: 1398 – 1402 [ DOI: 10.1109/ACSSC.2003.1292216 http://dx.doi.org/10.1109/ACSSC.2003.1292216 ]
Wei Y , Mao T , Li B , Wang F , Li F , Zhang Z and Zhao Y . 2025 . Visual and Large Multimodal Models Promote Image Restoration and Enhancement: Research Progress . Journal of Image and Graphics , 30 ( 5 ): 1197 - 1219
韦炎炎 , 毛天一 , 李柏昂 , 王飞 , 李锋 , 张召 , 赵洋 . 2025 . 视觉模型及多模态大模型推进图像复原增强研究进展 . 中国图象图形学报 , 30 ( 5 ): 1197 - 1219 [ DOI: 10.11834/jig.240436 http://dx.doi.org/10.11834/jig.240436 ]
Xie Y , Cheng K L and Chen Q . 2021 . Enhanced Invertible Encoding for Learned Image Compression // Proceedings of the 29th ACM International Conference on Multimedia (MM ’21) . ACM : 162 – 170 [ DOI: 10.1145/3474085.3475213 http://dx.doi.org/10.1145/3474085.3475213 ]
Xue N and Zhang Y . 2023 . Lambda-Domain Rate Control for Neural Image Compression // Proceedings of the 5th ACM International Conference on Multimedia in Asia . Tainan Taiwan : ACM: 1 – 7 [ DOI: 10.1145/3595916.3626372 http://dx.doi.org/10.1145/3595916.3626372 ]
Yang C , Ma Y , Yang J , Liu S and Wang R . 2021 . Graph-Convolution Network for Image Compression // IEEE International Conference on Image Processing (ICIP) . Anchorage, AK, USA : IEEE: 2094 – 2098 [ DOI: 10.1109/ICIP42928.2021.9506704 http://dx.doi.org/10.1109/ICIP42928.2021.9506704 ]
Yang F , Herranz L , Cheng Y and Mozerov M G . 2021 . Slimmable Compressive Autoencoders for Practical Neural Image Compression // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, TN, USA : IEEE: 4996 – 5005 [ DOI: 10.1109/CVPR46437.2021.00496 http://dx.doi.org/10.1109/CVPR46437.2021.00496 ]
Yang F , Herranz L , Weijer J V D , Guitian J A I , Lopez A M and Mozerov M G . 2020 . Variable Rate Deep Image Compression With Modulated Autoencoder . IEEE Signal Processing Letters , 27 : 331 – 335 [ DOI: 10.1109/LSP.2020.2970539 http://dx.doi.org/10.1109/LSP.2020.2970539 ]
Yang R , Mentzer F , Gool L V and Timofte R . 2021 . Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model . IEEE Journal of Selected Topics in Signal Processing , 15 ( 2 ): 388 – 401 [ DOI: 10.1109/JSTSP.2020.3043590 http://dx.doi.org/10.1109/JSTSP.2020.3043590 ]
Yang W , Huang H , Hu Y , Duan L-Y and Liu J . 2024 . Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics . IEEE Transactions on Pattern Analysis and Machine Intelligence , 46 ( 7 ): 5174 – 5191 [ DOI: 10.1109/TPAMI.2024.3367293 http://dx.doi.org/10.1109/TPAMI.2024.3367293 ]
Yu J , Mai S , Zhang P , Jiang Y and Cheng J . 2025 . Mixed-Precision Post-Training Quantization for Learned Image Compression . IEEE Internet of Things Journal , 12 ( 16 ): 34392 – 34405 [ DOI: 10.1109/JIOT.2025.3578318 http://dx.doi.org/10.1109/JIOT.2025.3578318 ]
Yu Y , Wang Y , Yang W , Lu S , Tan Y-P and Kot A C . 2023 . Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Vancouver, BC, Canada : IEEE: 12250 – 12259 [ DOI: 10.1109/CVPR52729.2023.01179 http://dx.doi.org/10.1109/CVPR52729.2023.01179 ]
Zeng F , Tang H , Shao Y , Chen S , Shao L and Wang Y . 2025 . MambaIC: State Space Models for High-Performance Learned Image Compression // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville TN, USA
Zhang H , Li L and Liu D . 2025 . Generalized Gaussian Model for Learned Image Compression . IEEE Transactions on Image Processing , 34 : 1950 – 1965 [ DOI: 10.1109/TIP.2025.3550013 http://dx.doi.org/10.1109/TIP.2025.3550013 ]
Zhang J , Jia C , Lei M , Wang S , Ma S and Gao W . 2019 . Recent Development of AVS Video Coding Standard: AVS3//2019 Picture Coding Symposium (PCS) . Ningbo, China : IEEE : 1 – 5 [ DOI: 10.1109/PCS48529.2019.8954589 http://dx.doi.org/10.1109/PCS48529.2019.8954589 ]
Zhang L , Zhang L , Mou X and Zhang D . 2011 . FSIM: A Feature Similarity Index for Image Quality Assessment . IEEE Transactions on Image Processing , 20 ( 8 ): 2378 – 2386 [ DOI: 10.1109/TIP.2011.2109730 http://dx.doi.org/10.1109/TIP.2011.2109730 ]
Zhang Q , Mei J , Guan T , Sun Z , Zhang Z and Yu L . 2024 . Recent Advances in Video Coding for Machines Standard and Technologies . ZTE Communications , 22 ( 1 ): 62 - 76 .
Zhang R , Isola P , Efros A A , Shechtman E and Wang O . 2018 . The Unreasonable Effectiveness of Deep Features as a Perceptual Metric // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lack City, USA : IEEE 586 – 595 [ DOI: 10.1109/CVPR.2018.00068 http://dx.doi.org/10.1109/CVPR.2018.00068 ]
Zhang T , Luo X , Li L and Liu D . 2025 . StableCodec: Taming One-Step Diffusion for Extreme Image Compression // Proceedings of the IEEE/CVF International Conference on Computer Vision .
Zhang X , Guo P , Lu M and Ma Z . 2024 . All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation // Proceedings of the 38th International Conference on Neural Information Processing Systems . Vancouver, BC, Canada : Curran Associates Inc. : 71465 – 71503 [ DOI: 10.5555/3737916.3740199 http://dx.doi.org/10.5555/3737916.3740199 ]
Zhang X , Lu M , Chen Y and Ma Z . 2025 . Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference // IEEE International Conference on Multimedia and Expo (ICME) . Nantes, France : IEEE 1 - 6 [ DOI: 10.1109/ICME59968.2025.11209906 http://dx.doi.org/10.1109/ICME59968.2025.11209906 ]
Zhang X and Wu X . 2023 . LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Vancouver, BC, Canada : IEEE 10239 - 10248 [ DOI: 10.1109/CVPR52729.2023.00987 http://dx.doi.org/10.1109/CVPR52729.2023.00987 ]
Zhang Y , Zhang R , Zhou H , Ji Q , Yu Z and Huang T . 2025 . Research Status and Development Trends of Vision Foundation Models . Journal of Image and Graphics , 30 ( 01 ): 0001 - 0024
张燚钧 , 张润清 , 周华健 , 齐骥 , 余肇飞 , 黄铁军 . 2025 . 视觉基础模型研究现状与发展趋势 . 中国图象图形学报 , 30 ( 01 ): 0001 - 0024 [ DOI: 10.11834/jig.230911 http://dx.doi.org/10.11834/jig.230911 ]
Zhan R , Fan Y , Zhou L , Xie Y , Chen J , Yang H , Huang D and Wang Y . Dynamically Distribution-Aware Quantization for Diffusion Models [J/OL]. Journal of Image and Graphics , 2025 , 1 - 10
占瑞乙 , 樊轶 , 周丽娜 , 谢宇宝 , 陈佳鑫 , 杨鸿宇 , 黄迪 , 王蕴红 . 分布范围动态感知的扩散模型量化 [J/OL]. 中国图象图形学报 , 2025 , 1 - 10 [DOI: 10.11834/jig.250319]
Zhen Y , Yu Z and Huang T . A Literature Review for Neural Networks-Based Encoding Models of Biological Visual System [J]. Journal of Image and Graphics , 2023 , 28 ( 2 ): 335 - 357
郑雅菁 , 余肇飞 , 黄铁军 . 生物视觉系统的神经网络编码模型综述 [J]. 中国图象图形学报 , 2023 , 28 ( 2 ): 335 - 357 [ DOI: 10.11834/jig.220461 http://dx.doi.org/10.11834/jig.220461 ]
Zheng J and Meister M . 2025 . The Unbearable Slowness of Being: Why Do We Live at 10 Bits/s? Neuron , 113 ( 2 ): 192 – 204 [ DOI: 10.1016/j.neuron.2024.11.008 http://dx.doi.org/10.1016/j.neuron.2024.11.008 ]
Zheng Y , Chen Y , Qian B , Shi X , Shu Y and Chen J . 2025 . A Review on Edge Large Language Models: Design, Execution, and Applications . ACM Computing Surveys , 57 ( 8 ): 1 – 35 [ DOI: 10.1145/3719664 http://dx.doi.org/10.1145/3719664 ]
Zhou L , Sun Z , Wu X and Wu J . 2019 . End-to-End Optimized Image Compression with Attention Mechanism // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops . California, USA
Zhu Y , Yang Y and Cohen T . 2022 . Transformer-Based Transform Coding // The Tenth International Conference on Learning Representations (Virtual)
Zhu W , Wang X , Tian Y and Go W . 2022 . Multimedia Intelligence: When Multimedia Meets Artificial Intelligence . Chinese Journal of Image and Graphics , 27 ( 9 ): 2551 - 2573
朱文武 , 王鑫 , 田永鸿 , 高文 . 2022 . 多媒体智能: 当多媒体遇到人工智能 [J]. 中国图象图形学报 , 27 ( 9 ): 2551 - 2573 [ DOI: 10.11834/jig.220086 http://dx.doi.org/10.11834/jig.220086 ]
相关作者
相关机构
京公网安备11010802024621