融合颜色查询与特征增强的图像上色方法

于冰; 相雪; 范正辉; 黄东晋; 丁友东

doi:10.11834/jig.240506

浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

融合颜色查询与特征增强的图像上色方法
Image colorization via color query and feature enhancement
2025年页码：1-15
网络出版日期： 2025-02-17 ，

录用日期： 2025-02-13
DOI： 10.11834/jig.240506
稿件说明：

移动端阅览

于冰,相雪,范正辉等.融合颜色查询与特征增强的图像上色方法[J].中国图象图形学报,

Yu Bing,Xiang Xue,Fan Zhenghui,et al.Image colorization via color query and feature enhancement[J].Journal of Image and Graphics,
于冰,相雪,范正辉等.融合颜色查询与特征增强的图像上色方法[J].中国图象图形学报, DOI： 10.11834/jig.240506.

Yu Bing,Xiang Xue,Fan Zhenghui,et al.Image colorization via color query and feature enhancement[J].Journal of Image and Graphics, DOI： 10.11834/jig.240506.

摘要

目的

图像上色在老照片修复和黑白电影增强等方面具有重要应用。现有方法在颜色预测过程中由于无法保证色彩的一致性，缺乏对局部细节的精细处理，导致某些色彩区域的上色效果不佳。

方法

采用编码器-解码器结构，编码器用于提取灰度图像特征，解码器用于恢复空间分辨率。颜色预测网络通过丰富的视觉特征来细化颜色查询，并通过像素增强模块学习空间上的关注度来增强特定区域的像素。进一步，所提方法通过特征增强模块优化原图和生成图之间的颜色匹配关系，全面地捕捉图像的特征，实现细节保持的图像上色，减少颜色溢出。

结果

实验在数据集ImageNet（val5k）、ImageNet（val50k）、COCO-Stuff、ADE20K和CelebA-HQ上与最新的5种灰度图像自动上色方法进行了比较，在上色结果的客观质量对比中，与性能第2的模型相比，所提方法在评价指标Frechet初始距离（Frechet Inception Distance， FID）上降低了0.2，在峰值信噪比（Peak Signal-to-Noise Ratio， PSNR）上提升了0.13dB；在色彩指标的对比中，所提方法在色彩丰富度方面取得了最高的分数；在主观评价和用户调查中，所提方法的上色效果与人类的审美感受较为一致，得到了最为优异的评价。此外，消融实验结果进一步证明了所提方法采用的模型结构在提升上色性能方面的有效性。

结论

所提出的上色模型，更能捕捉并再现图像中的细节与色彩关系，实现了高质量的上色效果。

Abstract

Objective

Image colorization refers to the process of predicting plausible colors for each pixel in a grayscale image， with the goal not necessarily being an exact restoration of the original color. Given that the same object can often be assigned different colors， colorization inherently exhibits multimodal characteristics. This complexity presents both challenges for researchers and sustained interest in the field. As an important direction in computer vision， image colorization has gained significant attention， particularly with the advancements in deep learning. Traditional colorization methods often rely on user input， such as scribble-based or reference image-based techniques， which， while effective， are hindered by limitations such as the inability to handle batch processing and extended processing times. To address these issues and reduce manual intervention， deep learning techniques have propelled the development of automatic colorization. Fully automatic methods， which eliminate the need for user interaction， can efficiently colorize images； however， challenges remain， including the accuracy of color restoration and the preservation of image details. Deep learning approaches， particularly those based on Convolutional Neural Networks （CNNs） and Generative Adversarial Networks （GANs）， have demonstrated improvements in colorization performance but continue to face issues such as insufficient semantic understanding and blurred details. Recently， Transformer models have shown promise in image colorization tasks by leveraging their ability to capture long-range dependencies， further enhancing results. Nevertheless， existing methods still struggle with challenges such as color bleeding and loss of detail clarity， especially in highly detailed images. Achieving fully automated， natural， and plausible colorization remains an ongoing research challenge.

Method

In this study， we propose an end-to-end grayscale image colorization method utilizing an encoder-decoder architecture for fully automatic colorization. Given an input grayscale image， the network predicts the chrominance channels in the CIELAB color space to generate a colorized image. The encoder employs ConvNeXt to leverage its multi-scale semantic representation capabilities， effectively extracting high-level semantic features from the grayscale image. Multi-scale feature maps are passed from the encoder to the decoder through convolutional connection layers， progressively restoring the image's spatial resolution. These feature maps are then fed into the color prediction network， where a Pixel Enhancement Block （PEB） refines the color predictions. The PEB is designed to focus on and enhance specific regions of the image， improving color matching accuracy by utilizing convolutional layers and pooling operations to generate spatial attention weights. These weights are element-wise multiplied with the original image， enabling spatial enhancement and better capturing important regions for improved color precision. The Color Query Block （CQB） in the color prediction network employs a Transformer-based approach， incorporating learnable color embedding memories that store sequences of color representations. Through cross-attention and self-attention mechanisms， color embeddings are progressively correlated with image features， reducing dependence on manual priors and improving sensitivity to semantic information. This approach mitigates issues such as color bleeding. Furthermore， to enhance the learning of both color information and latent features from the grayscale image， the Feature Enhancement Block （FEB） generates attention maps using convolutional operations with varying kernel sizes. These maps are fused with the original image through convolutional layers to produce the final output tensor. This methodology ensures effective reconstruction of both color and structural information， thereby enhancing the overall performance of the image colorization process.

Result

In the experiments， the proposed colorization model was trained on the large-scale ImageNet dataset and extensively evaluated across multiple benchmark datasets， including ImageNet （val5k）， ImageNet （val50k）， COCO-Stuff， ADE20K， and CelebA-HQ. The evaluation metrics included Frechet Inception Distance （FID）， Colorfulness Score （CF）， and Peak Signal-to-Noise Ratio （PSNR）， which assess the realism， color quality， and reconstruction accuracy of the generated images. The model utilized a pre-trained ConvNeXt-L as the encoder， paired with a multi-scale decoder， and was optimized using the AdamW optimizer. All experiments were performed on four Tesla A100 GPUs. Comparative results showed that the proposed method significantly outperformed existing approaches such as Deoldify， Wu et al.， BigColor， ColorFormer， and DDColor， particularly in terms of color richness and realism. Quantitative comparisons across five test datasets further demonstrated that the proposed model consistently achieved the lowest FID scores， indicating superior image quality and strong generalization. Compared to the second-best model， the FID is reduced by 0.2， while the PSNR is improved by 0.13 dB. While previous methods achieved higher CF scores， it was noted that a higher colorfulness score does not always correlate with better visual quality. To address this， the metric △CF was introduced to measure the difference in colorfulness between generated and real images. The proposed method achieved the lowest △CF scores across all datasets， reflecting its ability to generate more natural and realistic colorizations while preserving image diversity. Given the subjective nature of image colorization， a user study was also conducted， showing that over 30% of users preferred the colorization results produced by the proposed method. Additionally， ablation studies confirmed the effectiveness of the model architecture in enhancing colorization performance.

Conclusion

In this study， the proposed colorization model is better at capturing and reproducing the details and color relationships in the image， achieving high-quality colorization results.

关键词

Keywords

references

Antic J . 2019 . Deoldify： A deep learning based project for colorizing and restoring old images （and video！）［EB/OL］. ［ 2024-08-17 ］. https://github.com/jantic/DeOldify https://github.com/jantic/DeOldify

Ba J L ， Kiros J R and Hinton G E . 2016 . Layer normalization ［EB/OL］. ［ 2024-08-17 ］. https://arxiv.org/abs/1607.06450 https://arxiv.org/abs/1607.06450

Caesar H ， Uijlings J and Ferrari V . 2018 . COCO-Stuff： Thing and stuff classes in context // Proceedings of 2018 IEEE International Conference on computer vision and pattern recognition . Salt Lake City， USA ： IEEE： 1209 - 1218 ［ DOI： 10.1109/CVPR.2018.00132 http://dx.doi.org/10.1109/CVPR.2018.00132 ］

Chen Y ， Zhao Y ， Zhang X J and Liu X P . 2024 . Sketch colorization with finite color space prior . Journal of Image and Graphics ， 29 （ 04 ）： 0978 - 0988

陈缘，赵洋，张效娟，刘晓平 . 2024 . 有限色彩空间下的线稿上色 . 中国图象图形学报， 29 （ 04 ）： 0978 - 0988 ［ DOI： 10.11834/ jig. 230189 http://dx.doi.org/10.11834/jig.230189 ］

Cheng Z Z ， Yang Q X and Sheng B . 2015 . Deep colorization // Proceedings of 2015 IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 415 - 423 ［ DOI： 10.1109/ICCV.2015.111 http://dx.doi.org/10.1109/ICCV.2015.111 ］

Chia A Y S ， Zhuo S J ， Gupta R K ， Tai Y W ， Cho S Y ， Tan P and Lin S . 2011 . Semantic colorization with internet images . ACM Transactions on Graphics ， 30 （ 6 ）： 1 - 8 ［ DOI： 10.1145/2070781.2024190 http://dx.doi.org/10.1145/2070781.2024190 ］

Hasler D and Suesstrunk S E . 2003 . Measuring colorfulness in natural images. Human Vision and Electronic Imaging VIII . Santa Clara， United States ： SPIE 5007 ： 87 - 96 . ［ DOI： 10.1117/12.477378 http://dx.doi.org/10.1117/12.477378 ］

Heusel M ， Ramsauer H ， Unterthiner T Nessler B and Hochreiter S . 2018 . GANs trained by a two time-scale update rule converge to a local Nash equilibrium // Proceedings of the 31st International Conference on Neural Information Processing Systems . California， USA ： Curran Associates， Inc.： 6626 - 6637 . ［ DOI： 10.5555/3295222.3295408 http://dx.doi.org/10.5555/3295222.3295408 ］

Hua X ， Shu T ， Li M X ， Shi Y and Hong H Y . 2024 . Nonlocal feature representation-embedded blurred image restoration . Journal of Image and Graphics ， 29 （ 10 ）： 3033 - 3046

华夏，舒婷，李明欣，时愈，洪汉玉 . 2024 . 融合非局部特征表示的模糊图像复原 . 中国图象图形学报， 29 （ 10 ）： 3033 - 3046 ［ DOI： 10. 11834/jig. 230735 http://dx.doi.org/10.11834/jig.230735 ］

Huynh-thu Q and Ghanbari M . 2008 . Scope of validity of PSNR in image/video quality assessment . Electronics Letters ， 44 （ 13 ）： 800 - 801 ［ DOI： 10.1049/el：20080522 http://dx.doi.org/10.1049/el：20080522 ］

Isola P ， Zhu J Y ， Zhou T H and Efros A A . 2017 . Image-to-image translation with conditional adversarial networks // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 1125 - 1134 ［ DOI： 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ］

Ji X Z ， Jiang B Y ， Luo D H ， Tao G P ， Chu W Q ， Xie Z F ， Wang C J and Tai Y . 2022 . Image colorization via color memory assisted hybrid-attention transformer // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 20 - 36 ［ DOI： 10.1007/978-3-031-19787-1_2 http://dx.doi.org/10.1007/978-3-031-19787-1_2 ］

Jia D ， Wei D ， Richard S ， Li L J ， Li K and Li F F . 2009 . ImageNet： A large-scale hierarchical image database // Proceedings of 2009 IEEE International Conference on computer vision and pattern recognition . Miami， USA ： IEEE： 248 - 255 ［ DOI： 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ］

Karras T ， Aila T ， Laine S and Lehtinen J . 2018 . Progressive growing of gans for improved quality ， stability， and variation// Proceedings of 2018 International Conference on Learning Representations . Vancouver， Canada ［ DOI： 10.48550/arXiv.1710.10196 http://dx.doi.org/10.48550/arXiv.1710.10196 ］

Kang X Y ， Yang T ， Ouyang W Q ， Ren P ， Li L Z and Xie X S . 2023 . DDColor： Towards Photo-Realistic Image Colorization via Dual Decoders // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 328 - 338 ［ DOI： 10.1109/ICCV51070.2023.00037 http://dx.doi.org/10.1109/ICCV51070.2023.00037 ］

Kim G ， Kang K ， Kim S ， Lee H ， Kim S ， Kim J ， Baek S H and Cho S . 2022 . BigColor： Colorization using a generative color prior for natural images // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 350 - 366 ［ DOI： 10.1007/978-3-031-20071-7_21 http://dx.doi.org/10.1007/978-3-031-20071-7_21 ］

Kumar M ， Weissenborn D and Kalchbrenner N . 2021 . Colorization transformer // Proceedings of 2021 International Conference on Learning Representations . Vienna， Austria ［ DOI： 10.48550/arXiv.2102.04432 http://dx.doi.org/10.48550/arXiv.2102.04432 ］

Liang Z X ， Li Z C ， Zhou S C ， Li C Y and Loy C C . 2024 . Control Color： Multimodal Diffusion-based Interactive Image Colorization . arXiv preprint arXiv： 2402.10855 .

Liu X P ， Wan L ， Qu Y G ， Wong T ， Lin S ， Leung-C and Heng P . 2008 . Intrinsic colorization . ACM Transactions on Graphics ， 27 （ 5 ）： 1 - 9 ［ DOI： 10.1145/1409060.1409119 http://dx.doi.org/10.1145/1409060.1409119 ］

Liu Z ， Mao H Z ， Wu C Y ， Feichtenhofer C ， Darrell T and Xie S N . 2022 . A convnet for the 2020s//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans ， LA， USA ： IEEE： 11976 - 11986 ［ DOI： 10.1109/CVPR52688.2022.01167 http://dx.doi.org/10.1109/CVPR52688.2022.01167 ］

Loshchilov I and Hutter F . 2017 . Decoupled weight decay regularization ［EB/OL］. ［ 2024-08-17 ］. https://arxiv.org/abs/1711.05101 https://arxiv.org/abs/1711.05101

Qu Y G ， Wong T T and Heng P A . 2006 . Manga colorization . ACM Transactions on Graphics ， 25 （ 3 ）： 1214 - 1220 ［ DOI： 10.1145/1141911.1142017 http://dx.doi.org/10.1145/1141911.1142017 ］

Rombach R ， Blattmann A ， Lorenz D ， Esser P and Ommer B . 2022 . High-resolution image synthesis with latent diffusion models // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE 10684 - 10695 ［ DOI： 10.1109/CVPR52688.2022.01042 http://dx.doi.org/10.1109/CVPR52688.2022.01042 ］

Simonyan K and Zisserman A . 2014 . Very deep convolutional networks for large-scale image recognition . Computer Science . ［ DOI： 10.48550/arXiv.1409.1556 http://dx.doi.org/10.48550/arXiv.1409.1556 ］

Szegedy C ， Vanhoucke V ， Ioffe S ， Shlens J and Wojna Z . 2016 . Rethinking the Inception Architecture for Computer Vision // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 2818 - 2826 ［ DOI： 10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ］

Su J W ， Chu H K and Huang J B . 2020 . Instance-aware image colorization // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 7968 - 7977 ［ DOI： 10.1109/CVPR42600.2020.00799 http://dx.doi.org/10.1109/CVPR42600.2020.00799 ］

Vaswani A ， Shazeer N ， Parmar N ， Uszkoreit J ， Jones L ， Gomez A N ， Kaiser L and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ， 30 ： 5998 - 6008 . ［ DOI： 10.48550/arXiv.1706.03762 http://dx.doi.org/10.48550/arXiv.1706.03762 ］

Wu Y Z ， Wang X T ， Li Y ， Zhang H L ， Zhao X and Shan Y . 2021 . Towards vivid and diverse image colorization with generative color prior // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision ， Montreal， Canada ： IEEE： 14377 - 14386 ［ DOI： 10.1109/ICCV48922.2021.01411 http://dx.doi.org/10.1109/ICCV48922.2021.01411 ］

Xu R S ， Tu Z Z ， Du Y Q ， Dong X Y ， Li J L ， Meng Z B ， Ma J Q ， Bovik A and Yu H K . 2023 . Pik-Fix： Restoring and Colorizing Old Photos // Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， HI， USA ： IEEE： 1724 - 1734 ［ DOI： 10.1109/WACV56688.2023.00177 http://dx.doi.org/10.1109/WACV56688.2023.00177 ］

Zhang R ， Isola P and Efros A A . 2016 . Colorful image colorization // Proceedings of the 14th European Conference . Amsterdam， The Netherlands ： Springer： 649 - 666 ［ DOI： 10.1007/978-3-319-46487-9_40 http://dx.doi.org/10.1007/978-3-319-46487-9_40 ］

Zhou B ， Zhao H ， Puig X ， Fidler S ， Barriuso A and Torralba A . 2017 . Scene parsing through ADE20K dataset // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， USA ： IEEE： 633 - 641 ［ DOI： 10.1109/CVPR.2017.544 http://dx.doi.org/10.1109/CVPR.2017.544 ］

Zhang L ， Rao A and Agrawala M . 2023 . Adding conditional control to text-to-image diffusion models // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 3836 - 3847 .

文章被引用时，请邮件提醒。

提交