面向文本识别的对抗样本攻击综述

郭凯威; 杨奎武; 张万里; 胡学先; 刘文钊

doi:10.11834/jig.230412

综述 | 浏览量 : 0 下载量: 8 CSCD: 0

PDF
导出
分享
收藏
专辑

面向文本识别的对抗样本攻击综述
A review of adversarial examples for optical character recognition
2024年29卷第9期页码：2672-2691
纸质出版日期： 2024-09-16 ，
DOI： 10.11834/jig.230412
稿件说明：

移动端阅览

郭凯威，杨奎武，张万里，胡学先，刘文钊. 2024. 面向文本识别的对抗样本攻击综述. 中国图象图形学报， 29(09):2672-2691

Guo Kaiwei， Yang Kuiwu， Zhang Wanli， Hu Xuexian， Liu Wenzhao. 2024. A review of adversarial examples for optical character recognition. Journal of Image and Graphics， 29(09):2672-2691
郭凯威，杨奎武，张万里，胡学先，刘文钊. 2024. 面向文本识别的对抗样本攻击综述. 中国图象图形学报， 29(09):2672-2691 DOI： 10.11834/jig.230412.

Guo Kaiwei， Yang Kuiwu， Zhang Wanli， Hu Xuexian， Liu Wenzhao. 2024. A review of adversarial examples for optical character recognition. Journal of Image and Graphics， 29(09):2672-2691 DOI： 10.11834/jig.230412.

摘要

文本识别技术可以分为光学字符识别（optical character recognition， OCR）和场景文本识别（scene text recognition，STR），其中STR是在OCR基础上针对日益复杂的应用场景衍生出来的。依托深度学习，OCR技术近年来取得了长足进步并大规模商业落地，但深度学习面临的对抗样本攻击问题也给OCR带来了安全威胁。目前大多数OCR模型均存在识别自然扰动和防御对抗样本攻击能力差的问题，如OCR模型在噪声、水印和梯度等攻击算法下的识别准确率大大降低。相比图像领域，文本识别领域的对抗样本攻击研究还远远不够。文本识别通常被视为一个序列到序列的问题，其中输入（如图像中的像素）和输出（像素对应的字符）都是序列，这使得对抗样本的生成更具挑战性。本文对文本识别的对抗样本攻击和防御方法进行研究综述，梳理了近年来文本识别领域的对抗样本攻击方法并进行对比分析，根据攻击类型、应用场景和模型可知性，对攻击方式进行了系统分类。具体来说，按照攻击类型，可分为基于梯度的攻击、基于优化的攻击和基于生成模型的攻击；按照应用场景，可以分为OCR攻击和STR攻击；按照模型可知性，可分为白盒攻击和黑盒攻击。除了回顾文本识别对抗样本攻击方法，还简要介绍了防御技术，具体分为数据预处理、文本篡改检测和传统对抗防御技术。通过这些技术的应用，可以有效地提升文本识别模型的安全性和鲁棒性。最后，总结了文本识别领域对抗样本攻击及防御面临的挑战，并对未来发展方向做出展望。

Abstract

In the context of deep learning， an increasing number of fields are adopting deep and recurrent neural networks to construct high-performance data-driven models. Text recognition is widely applied in daily life fields， such as autonomous driving， product retrieval， text translation， document recognition， and logistics sorting. The detection and recognition of text from scene images can considerably reduce labor costs， improve work efficiency， and promote the development of information intelligence. Therefore， the research on text detection and recognition technology has practical and scientific value. The field of text recognition has resulted in the use of methods from recognition and sequence networks， which led to evolving technologies， such as methods based on connectionist temporal classification （CTC） loss， those based on attention mechanisms， and end-to-end recognition. CTC- and attention-based approaches perceive the task of matching text images with the corresponding character sequences as a sequence-to-sequence recognition issue by employing an encoder for the process. End-to-end text recognition methods meld text detection and recognition modules into a unified model， which facilitates the simultaneous training of both modules. Although the advancement of deep learning has driven the development of optical character recognition （OCR） technology， some researchers have discovered serious vulnerabilities in deep models： The addition of minute disturbances to images can cause the model to make incorrect judgments. In applications demanding a high performance， this phenomenon greatly hinders the application process of deep models. Therefore， an increasing number of researchers are beginning to focus on strategies for understanding the deep model’s response to this anomaly. Before understanding how the model resists this disturbance’s performance， a key task is discovering a mechanism for better disturbance generate， i.e.， how to attack the model. Thus， most current research focuses on the development of algorithms that can generate disturbances efficiently. This article reviews and summarizes various adversarial examples of attack methods proposed in the field of text recognition. Approaches to adversarial attacks are divided into three types： gradient-， optimization-， and generative model-based types. These categories are further delineated into white- and black-box techniques， which are contingent upon the level of access to model parameters. In the field of text recognition， prevalent attack strategies involve watermark tactics and cleverly embed disturbances within the watermark. This approach maintains the attack success rate whilst rendering the adversarial image perceptibly natural to the human observer. Common attack methods also include additive and erosion disturbances and minimal pixel attack methods. Generative adversarial network-based attacks have contributed to the research on English and Chinese font-style generation. They deceive machine learning models by producing examples similar to the original data and thereby improve the robustness and reliability of OCR models. The research on Chinese font-style conversion can be attributed to three categories： 1） Stroke trajectory-based Chinese character generation methods generate novel characters by scrutinizing the stroke trajectory inherent to Chinese script. These techniques harness the unique stroke traits of Chinese characters to engender properties with similar stylistic attributes to accomplish style transference； 2） Style-based Chinese character generation methods generate new Chinese characters of specific style by learning the style features of various fonts； 3） Methods based on content and style features generate Chinese characters with specific style and content by learning the representation of content and style features. The attack of OCR adversarial examples provoked reflections on the security of neural networks. Some defense methods include data preprocessing， text tampering detection， and traditional adversarial sample defense methods. Finally， this review summarizes the challenges faced by adversarial sample attacks and defenses in text recognition. In the future， the transition from a white-box environment to a black-box environment requires extreme amount of attraction. In classification models， the content of black-box queries is relatively direct object， with only the unnormalized logical output of the last layer of the model needed to be obtained. However， sequence tasks are incapable of performing single-step output， which makes more effective attacks in fewer query environments a challenging problem. Considerable advancements have been attained in response generation during the advent of substantial vision-language models， such GPT-4. Regardless， the associated privacy and security concerns warrant attention， and thus， the adversarial robustness of large models needs further research. This review aims to provide a comprehensive perspective for the comprehension and resolution of adversarial problems in recognition to find the right balance between practicality and security and promote the continuous progress of the field.

关键词

光学字符识别（OCR）场景文本识别（STR）对抗样本生成对抗网络（GAN）深度学习序列模型

Keywords

optical character recognition （OCR）scene text recognition （STR）adversarial examplesgenerative adversarial network （GAN）deep learningsequence model

references

Athalye A， Carlini N and Wagner D. 2018. Obfuscated gradients give a false sense of security： circumventing defenses to adversarial examples//Proceedings of the 35th International Conference on Machine Learning. ［s.l.］： PMLR： 274-283

Balkanski E， Chase H， Oshiba K， Rilee A， Singer Y and Wang R. 2020. Adversarial attacks on binary image recognition systems ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/2010.11782.pdfhttp://arxiv.org/pdf/2010.11782.pdf

Baluja S and Fischer I. 2017. Adversarial transformation networks： learning to generate adversarial examples ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1703.09387.pdfhttp://arxiv.org/pdf/1703.09387.pdf

Bartz C， Yang H J and Meinel C. 2017. STN-OCR： a single neural network for text detection and text recognition ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1707.08831.pdfhttp://arxiv.org/pdf/1707.08831.pdf

Bayram S and Barner K. 2023. A black-box attack on optical character recognition systems//Proceedings of 2022 CVMI Computer Vision and Machine Intelligence. Singapore， Singapore： Springer： 221-231 ［DOI： 10.1007/978-981-19-7867-8_18http://dx.doi.org/10.1007/978-981-19-7867-8_18］

Borkar T， Heide F and Karam L. 2020. Defending against universal attacks through selective feature regeneration//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 706-716 ［DOI： 10.1109/CVPR42600.2020.00079http://dx.doi.org/10.1109/CVPR42600.2020.00079］

Carlini N and Wagner D. 2017. Towards evaluating the robustness of neural networks//Proceedings of 2017 IEEE Symposium on Security and Privacy （SP）. San Jose， USA： IEEE： 39-57 ［DOI： 10.1109/SP.2017.49http://dx.doi.org/10.1109/SP.2017.49］

Chen L， Sun J and Xu W. 2020. FAWA： fast adversarial watermark attack on optical character recognition （OCR） systems//Proceedings of 2020 European Conference on Machine Learning and Knowledge Discovery in Databases. Ghent， Belgium： Springer： 547-563 ［DOI： 10.1007/978-3-030-67664-3_33http://dx.doi.org/10.1007/978-3-030-67664-3_33］

Chen L and Xu W. 2020. Attacking optical character recognition （OCR） systems with adversarial watermarks ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/2002.03095.pdfhttp://arxiv.org/pdf/2002.03095.pdf

Cipolla R， Gal Y and Kendall A. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7482-7491 ［DOI： 10.1109/CVPR.2018.00781http://dx.doi.org/10.1109/CVPR.2018.00781］

Deng J C， Dong L， Chen J H， Yan D Q， Wang R D， Ye D P， Zhao L C and Tian J Y. 2023. Universal defensive underpainting patch： making your text invisible to optical character recognition//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa， Canada： Association for Computing Machinery： 7559-7568 ［DOI： 10.1145/3581783.3613768http://dx.doi.org/10.1145/3581783.3613768］

Dong Y P， Liao F Z， Pang T Y， Su H， Zhu J， Hu X L and Li J G. 2018. Boosting adversarial attacks with momentum//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 9185-9193 ［DOI： 10.1109/CVPR.2018.00957http://dx.doi.org/10.1109/CVPR.2018.00957］

Du X H， Wu H M， Yi Z B， Li S S， Ma J and Yu J. 2021. Adversarial text attack and defense： a review. Journal of Chinese Information Processing， 35（8）： 1-15

杜小虎，吴宏明，易子博，李莎莎，马俊，余杰. 2021. 文本对抗样本攻击与防御技术综述. 中文信息学报， 35（8）： 1-15 ［DOI： 10.3969/j.issn.1003-0077.2021.08.001http://dx.doi.org/10.3969/j.issn.1003-0077.2021.08.001］

Feng X J， Yao H X and Zhang S P. 2019. Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity， 2019： #9345861 ［DOI： 10.1155/2019/9345861http://dx.doi.org/10.1155/2019/9345861］

Goodfellow I J， Shlens J and Szegedy C. 2015. Explaining and harnessing adversarial examples ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1412.6572.pdfhttp://arxiv.org/pdf/1412.6572.pdf

Guo C， Rana M， Cisse M and van der Maaten L. 2018. Countering adversarial images using input transformations ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1711.00117.pdfhttp://arxiv.org/pdf/1711.00117.pdf

He H B， Chen X Y， Wang C Y， Liu J H， Du B， Tao D C and Qiao Y. 2023. Diff-font： diffusion model for robust one-shot font generation ［EB/OL］. ［2023-09-26］. http://arxiv.org/pdf/2212.05895.pdfhttp://arxiv.org/pdf/2212.05895.pdf

Hinton G， Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network. Computer Science， 14（7）： 38-39 ［DOI： 10.4140/TCP.n.2015.249http://dx.doi.org/10.4140/TCP.n.2015.249］

Huang L C， Yang Y， Deng Y F and Yu Y N. 2015. DenseBox： unifying landmark localization with end to end object detection ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1509.04874.pdfhttp://arxiv.org/pdf/1509.04874.pdf

Jia X J， Wei X X， Cao X C and Foroosh H. 2019. ComDefend： an efficient image compression model to defend adversarial examples//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 6077-6085 ［DOI： 10.1109/CVPR.2019.00624http://dx.doi.org/10.1109/CVPR.2019.00624］

Jiang Y， Lian Z H， Tang Y M and Xiao J G. 2017. DCFont： an end-to-end deep Chinese font generation system//Proceedings of 2017 SIGGRAPH Asia Technical Briefs. Bangkok， Thailand： Association for Computing Machinery： #22 ［DOI： 10.1145/3145749.3149440http://dx.doi.org/10.1145/3145749.3149440］

Jiang Y， Lian Z H， Tang Y M and Xiao J G. 2019. SCFont： structure-guided Chinese font generation via deep stacked networks//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu， USA： AAAI， 33（1）： 4015-4022 ［DOI： 10.1609/aaai.v33i01.33014015http://dx.doi.org/10.1609/aaai.v33i01.33014015］

Joren H， Gupta O and Raviv D. 2022. Learning document graphs with attention for image manipulation detection//Proceedings of the 3rd International Conference on Pattern Recognition and Artificial Intelligence. Paris， France： Springer： 263-274 ［DOI： 10.1007/978-3-031-09037-0_22http://dx.doi.org/10.1007/978-3-031-09037-0_22］

Kherchouche A， Fezza S A and Hamidouche W. 2022. Detect and defense against adversarial examples in deep learning using natural scene statistics and adaptive denoising. Neural Computing and Applications， 34（24）： 21567-21582 ［DOI： 10.1007/s00521-021-06330-xhttp://dx.doi.org/10.1007/s00521-021-06330-x］

Krizhevsky A， Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM， 60（6）： 84-90 ［DOI： 10.1145/3065386http://dx.doi.org/10.1145/3065386］

Kurakin A， Goodfellow I J and Bengio S. 2017. Adversarial machine learning at scale//Proceedings of the 5th International Conference on Learning Representations. Toulon， France： ICLR

Lei Q， Wu L F， Chen P Y， Dimakis A G， Dhillon I S and Witbrock M. 2019. Discrete adversarial attacks and submodular optimization with applications to text classification ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1812.00151.pdfhttp://arxiv.org/pdf/1812.00151.pdf

Li M L， Zhong N， Zhang X P， Qian Z X and Li S. 2022. Object-oriented backdoor attack against image captioning//Proceedings of 2022 IEEE International Conference on Acoustics， Speech and Signal Processing （ICASSP）. Singapore， Singapore： IEEE： 2864-2868 ［DOI： 10.1109/ICASSP43922.2022.9746440http://dx.doi.org/10.1109/ICASSP43922.2022.9746440］

Li R， Zheng S Y， Zhang C， Duan C X， Wang L B and Atkinson P M. 2021. ABCNet： attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery. ISPRS Journal of Photogrammetry and Remote Sensing， 181： 84-98 ［DOI： 10.1016/j.isprsjprs.2021.09.005http://dx.doi.org/10.1016/j.isprsjprs.2021.09.005］

Li X L， Yu N H， Zhang X P， Zhang W M， Li B， Lu W， Wang W and Liu X L. 2021. Overview of digital media forensics technology. Journal of Image and Graphics， 26（6）： 1216-1226

李晓龙，俞能海，张新鹏，张卫明，李斌，卢伟，王伟，刘晓龙. 2021. 数字媒体取证技术综述. 中国图象图形学报， 26（6）： 1216-1226 ［DOI： 10.11834/jig.210081http://dx.doi.org/10.11834/jig.210081］

Lian Z H， Zhao B and Xiao J G. 2016. Automatic generation of large-scale handwriting fonts via style learning//Proceedings of 2016 SIGGRAPH Asia Technical Briefs. Macau， China： Association for Computing Machinery： #12 ［DOI： 10.1145/3005358.3005371http://dx.doi.org/10.1145/3005358.3005371］

Liao F Z， Liang M， Dong Y P， Pang T Y， Hu X L and Zhu J. 2018. Defense against adversarial attacks using high-level representation guided denoiser//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1778-1787 ［DOI： 10.1109/CVPR.2018.00191http://dx.doi.org/10.1109/CVPR.2018.00191］

Liao M H， Shi B G， Bai X， Wang X G and Liu W Y. 2017. TextBoxes： a fast text detector with a single deep neural network//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco， USA： AAAI： #11196 ［DOI： 10.1609/aaai.v31i1.11196http://dx.doi.org/10.1609/aaai.v31i1.11196］

Liao M H， Zou Z S， Wan Z Y， Yao C and Bai X. 2023. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence， 45（1）： 919-931 ［DOI： 10.1109/TPAMI.2022.3155612http://dx.doi.org/10.1109/TPAMI.2022.3155612］

Lin J W， Hong C Y， Chang R I， Wang Y C， Lin S Y and Ho J M. 2015. Complete font generation of Chinese characters in personal handwriting style//Proceedings of the 34th IEEEInternational Performance Computing and Communications Conference （IPCCC）. Nanjing， China： IEEE： 1-5 ［DOI： 10.1109/PCCC.2015.7410321http://dx.doi.org/10.1109/PCCC.2015.7410321］

Liu X B， Liang D， Yan S， Chen D G， Qiao Y and Yan J J. 2018. FOTS： fast oriented text spotting with a unified network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5676-5685 ［DOI： 10.1109/CVPR.2018.00595http://dx.doi.org/10.1109/CVPR.2018.00595］

Liu Y H， Cao F M and Zhang Y Q. 2022. Generative adversarial examples for sequential text recognition models with artistic text style//Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods. Science and Technology Publications： 71-79 ［DOI： 10.5220/0010866800003122http://dx.doi.org/10.5220/0010866800003122］

Luo C J， Lin Q X， Liu Y L， Jin L W and Shen C H. 2021. Separating content from style using adversarial learning for recognizing text in the wild. International Journal of Computer Vision， 129（4）： 960-976 ［DOI： 10.1007/s11263-020-01411-1http://dx.doi.org/10.1007/s11263-020-01411-1］

Madry A， Makelov A， Schmidt L， Tsipras D and Vladu A. 2019. Towards deep learning models resistant to adversarial attacks ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1706.06083.pdfhttp://arxiv.org/pdf/1706.06083.pdf

Moosavi-Dezfooli S M， Fawzi A， Fawzi O and Frossard P. 2017. Universal adversarial perturbations//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu， USA： IEEE： 86-94 ［DOI： 10.1109/CVPR.2017.17http://dx.doi.org/10.1109/CVPR.2017.17］

Mustafa A， Khan S H， Hayat M， Shen J B and Shao L. 2020. Image super-resolution as a defense against adversarial attacks. IEEE Transactions on Image Processing， 29： 1711-1724 ［DOI： 10.1109/TIP.2019.2940533http://dx.doi.org/10.1109/TIP.2019.2940533］

Odena A， Olah C and Shlens J. 2017. Conditional image synthesis with auxiliary classifier GANs//Proceedings of the 34th International Conference on Machine Learning. Sydney， Australia： JMLR.org： 2642-2651 ［DOI： 10.5555/3305890.3305954http://dx.doi.org/10.5555/3305890.3305954］

Prakash A， Moran N， Garber S， DiLillo A and Storer J. 2018. Deflecting adversarial attacks with pixel deflection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8571-8580 ［DOI： 10.1109/CVPR.2018.00894http://dx.doi.org/10.1109/CVPR.2018.00894］

Redmon J and Farhadi A. 2018. YOLOv3： an incremental improvement ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1804.02767.pdfhttp://arxiv.org/pdf/1804.02767.pdf

Ren H L， Huang T and Yan H Y. 2021. Adversarial examples： attacks and defenses in the physical world. International Journal of Machine Learning and Cybernetics， 12（11）： 3325-3336 ［DOI： 10.1007/s13042-020-01242-zhttp://dx.doi.org/10.1007/s13042-020-01242-z］

Ren S Q， He K M， Girshick R and Sun J. 2017. Faster R-CNN： towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（6）： 1137-1149 ［DOI： 10.1109/TPAMI.2016.2577031http://dx.doi.org/10.1109/TPAMI.2016.2577031］

Shi B G， Bai X and Yao C. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 39（11）： 2298-2304 ［DOI： 10.1109/TPAMI.2016.2646371http://dx.doi.org/10.1109/TPAMI.2016.2646371］

Shi B G， Wang X G， Lyu P Y， Yao C and Bai X. 2016. Robust scene text recognition with automatic rectification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， USA： IEEE： 4168-4176 ［DOI： 10.1109/CVPR.2016.452http://dx.doi.org/10.1109/CVPR.2016.452］

Shi B G， Yang M K， Wang X G， Lyu P Y， Yao C and Bai X. 2019. ASTER： an attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（9）： 2035-2048 ［DOI： 10.1109/TPAMI.2018.2848939http://dx.doi.org/10.1109/TPAMI.2018.2848939］

Song C Z and Shmatikov V. 2018. Fooling OCR systems with adversarial text images ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1802.05385.pdfhttp://arxiv.org/pdf/1802.05385.pdf

Su J W， Vargas D V and Sakurai K. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation， 23（5）： 828-841 ［DOI： 10.1109/TEVC.2019.2890858http://dx.doi.org/10.1109/TEVC.2019.2890858］

Szegedy C， Zaremba W， Sutskever I， Bruna J， Erhan D， Goodfellow I and Fergus R. 2014. Intriguing properties of neural networks ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1312.6199.pdfhttp://arxiv.org/pdf/1312.6199.pdf

Tai J W， Li Y K， Jia X Q and Huang Q J. 2022. A survey： attacks and countermeasures of adversarial examples for speech recognition system. Journal of Cyber Security， 7（5）： 51-64

台建玮，李亚凯，贾晓启，黄庆佳. 2022. 语音识别系统对抗样本攻击及防御综述. 信息安全学报， 7（5）： 51-64 ［DOI： 10.19363/J.cnki.cn10-1380/tn.2022.09.05http://dx.doi.org/10.19363/J.cnki.cn10-1380/tn.2022.09.05］

Telea A. 2004. An image inpainting technique based on the fast marching method. Journal of Graphics Tools， 9（1）： 23-34 ［DOI： 10.1080/10867651.2004.10487596http://dx.doi.org/10.1080/10867651.2004.10487596］

Tesseract. 2016. ［EB/OL］. ［2023-07-04］. https://github.com/tesseract-ocr/tesseracthttps://github.com/tesseract-ocr/tesseract

Tian Y C. 2016. Rewrite： neural style transfer for Chinese fonts ［EB/OL］. ［2023-07-04］. https://github.com/kaonashi-tyc/Rewritehttps://github.com/kaonashi-tyc/Rewrite

Tian Z， Huang W L， He T， He P and Qiao Y. 2016. Detecting text in natural image with connectionist text proposal network//Proceedings of the 14th European Conference on Computer Vision （ECCV）. Amsterdam， the Netherlands： Springer： 56-72 ［DOI： 10.1007/978-3-319-46484-8_4http://dx.doi.org/10.1007/978-3-319-46484-8_4］

Tramèr F， Kurakin A， Papernot N， Goodfellow I， Boneh D and McDaniel P. 2020. Ensemble adversarial training： attacks and defenses ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1705.07204.pdfhttp://arxiv.org/pdf/1705.07204.pdf

Tramèr F， Papernot N， Goodfellow I， Boneh D and McDaniel P. 2017. The space of transferable adversarial examples ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1704.03453.pdfhttp://arxiv.org/pdf/1704.03453.pdf

Wan Z Y， Xie F M， Liu Y B， Bai X and Yao C. 2019. 2D-CTC for scene text recognition ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1907.09705.pdfhttp://arxiv.org/pdf/1907.09705.pdf

Wang W H， Xie E Z， Li X， Liu X B， Liang D， Yang Z B， Lu T and Shen C H. 2022. PAN++： towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence， 44（9）： 5349-5367 ［DOI： 10.1109/TPAMI.2021.3077555http://dx.doi.org/10.1109/TPAMI.2021.3077555］

Wick C， Reul C and Puppe F. 2018. Calamari—A high-performance tensorflow-based deep learning package for optical character recognition ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/1807.02004.pdfhttp://arxiv.org/pdf/1807.02004.pdf

Wu S J， Yang C Y and Hsu J Y J. 2020. CalliGAN： style and structure-aware Chinese calligraphy character generator ［EB/OL］. ［2023-07-04］. http://arxiv.org/pdf/2005.12500.pdfhttp://arxiv.org/pdf/2005.12500.pdf

Xiao C W， Li B， Zhu J Y， He W， Liu M Y and Song D. 2018. Generating adversarial examples with adversarial networks//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm， Sweden： AAAI Press： 3905-3911 ［DOI： 10.24963/ijcai.2018/543http://dx.doi.org/10.24963/ijcai.2018/543］

Xie C H， Wu Y X， van der Maaten L， Yuille A L and He K M. 2019. Feature denoising for improving adversarial robustness//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Long Beach， USA： IEEE： 501-509 ［DOI： 10.1109/CVPR.2019.00059http://dx.doi.org/10.1109/CVPR.2019.00059］

Xie Y C， Chen X Y， Sun L and Lu Y. 2021. DG-Font： deformable generative networks for unsupervised font generation//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 5130-5140 ［DOI： 10.1109/CVPR46437.2021.00509http://dx.doi.org/10.1109/CVPR46437.2021.00509］

Xu S H， Jin T， Jiang H and Lau F C M. 2009. Automatic generation of personal Chinese handwriting by capturing the characteristics of personal handwriting//Proceedings of the 21st Innovative Applications of Artificial Intelligence Conference. Pasadena， USA： AAAI： 191-196

Xu X， Chen J F， Xiao J H， Gao L L， Shen F M and Shen H T. 2020a. What machines see is not what they get： fooling scene text recognition models with adversarial text images//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 12301-12311 ［DOI： 10.1109/CVPR42600.2020.01232http://dx.doi.org/10.1109/CVPR42600.2020.01232］

Xu X， Chen J F， Xiao J H， Wang Z， Yang Y and Shen H T. 2020b. Learning optimization-based adversarial perturbations for attacking sequential recognition models//Proceedings of the 28th ACM International Conference on Multimedia. New York， USA： Association for Computing Machinery： 2802-2822 ［DOI： 10.1145/3394171.3413543http://dx.doi.org/10.1145/3394171.3413543］

Xu Y K， Dai P W and Cao X C. 2021. Less is better： fooling scene text recognition with minimal perturbations//Proceedings of the 28th International Conference on Neural Information Processing. Sanur， Indonesia： Springer： 537-544 ［DOI： 10.1007/978-3-030-92310-5_62http://dx.doi.org/10.1007/978-3-030-92310-5_62］

Xu Y K， Dai P W， Li Z K， Wang H J and Cao X C. 2023. The best protection is attack： fooling scene text recognition with minimal pixels. IEEE Transactions on Information Forensics and Security， 18： 1580-1595 ［DOI： 10.1109/TIFS.2023.3245984http://dx.doi.org/10.1109/TIFS.2023.3245984］

Yan H， Liu Y L， Jin L W and Bai X. 2023. The development， application， and future of LLM similar to ChatGPT. Journal of Image and Graphics， 28（9）： 2749-2762

严昊，刘禹良，金连文，白翔. 2023. 类ChatGPT大模型发展、应用和前景. 中国图象图形学报， 28（9）： 2749-2762 ［DOI： 10.11834/jig.230536http://dx.doi.org/10.11834/jig.230536］

Yang M K， Zheng H T， Bai X and Luo J B. 2021. Cost-effective adversarial attacks against scene text recognition//Proceedings of the 25th International Conference on Pattern Recognition （ICPR）. Milan， Italy： IEEE： 2368-2374 ［DOI： 10.1109/ICPR48806.2021.9412914http://dx.doi.org/10.1109/ICPR48806.2021.9412914］

Yang S， Wang Z Y， Wang Z W， Xu N， Liu J Y and Guo Z M. 2019. Controllable artistic text style transfer via shape-matching GAN//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Seoul， Korea （South）： IEEE： 4441-4450 ［DOI： 10.1109/ICCV.2019.00454http://dx.doi.org/10.1109/ICCV.2019.00454］

Yuan L， Li X M， Pan Z X， Sun J M and Xiao L. 2022. Review of adversarial examples for object detection. Journal of Image and Graphics， 27（10）： 2873-2896

袁珑，李秀梅，潘振雄，孙军梅，肖蕾. 2022. 面向目标检测的对抗样本综述. 中国图象图形学报， 27（10）： 2873-2896 ［DOI： 10.11834/jig.210209http://dx.doi.org/10.11834/jig.210209］

Yuan S Z， Liu R X， Chen M， Chen B Y， Qiu Z J and He X D. 2022. SE-GAN： skeleton enhanced gan-based model for brush handwriting font generation//Proceedings of 2022 IEEE International Conference on Multimedia and Expo （ICME）. Taipei， China： IEEE： 1-6 ［DOI： 10.1109/ICME52920.2022.9859964http://dx.doi.org/10.1109/ICME52920.2022.9859964］

Yuan X Y， He P， Lit X and Wu D P. 2020. Adaptive adversarial attack on scene text recognition//Proceedings of 2020 IEEE Conference on Computer Communications Workshops （INFOCOM WKSHPS）. Toronto， Canada： IEEE： 358-363 ［DOI： 10.1109/INFOCOMWKSHPS50562.2020.9162685http://dx.doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162685］

Zhang T， Yang K W， Wei J H， Liu Y and Ning Y L. 2022. Survey on detecting and defending adversarial examples for image data. Journal of Computer Research and Development， 59（6）： 1315-1328

张田，杨奎武，魏江宏，刘扬，宁原隆. 2022. 面向图像数据的对抗样本检测与防御技术综述. 计算机研究与发展， 59（6）： 1315-1328 ［DOI： 10.7544/issn1000-1239.20200777http://dx.doi.org/10.7544/issn1000-1239.20200777］

Zhong D J， Lyu S J， Shivakumara P， Yin B， Wu J J， Pal U and Lu Y. 2022a. SGBANet： semantic GAN and balanced attention network for arbitrarily oriented scene text recognition//Proceedings of the 17th European Conference on Computer Vision （ECCV 2022）. Tel Aviv， Israel： Springer： 464-480 ［DOI： 10.1007/978-3-031-19815-1_27http://dx.doi.org/10.1007/978-3-031-19815-1_27］

Zhong Y H， Cheng X Y， Chen T， Zhang J， Zhou Z K and Huang G. 2022b. PRPN： progressive region prediction network for natural scene text detection. Knowledge-Based Systems， 236： #107767 ［DOI： 10.1016/j.knosys.2021.107767http://dx.doi.org/10.1016/j.knosys.2021.107767］

Zhou B Y， Wang W H and Chen Z H. 2011. Easy generation of personal Chinese handwritten fonts//Proceedings of 2011 IEEE International Conference on Multimedia and Expo. Barcelona， Spain： IEEE： 1-6 ［DOI： 10.1109/ICME.2011.6011892http://dx.doi.org/10.1109/ICME.2011.6011892］

Zhu H G， Zhu Y， Zheng H R， Ren Y C and Jiang W M. 2023. LIGAA： Generative adversarial attack method based on low-frequency information. Computers and Security， 125： #103057 ［DOI： 10.1016/j.cose.2022.103057http://dx.doi.org/10.1016/j.cose.2022.103057］

文章被引用时，请邮件提醒。

提交