Deep learning methods for scene text detection and recognition
- Vol. 26, Issue 6, Pages: 1330-1367(2021)
Received:21 January 2021,
Revised:2021-2-27,
Accepted:06 March 2021,
Published:16 June 2021
DOI: 10.11834/jig.210044
移动端阅览

浏览全部资源
扫码关注微信
Received:21 January 2021,
Revised:2021-2-27,
Accepted:06 March 2021,
Published:16 June 2021
移动端阅览
许多自然场景图像中都包含丰富的文本,它们对于场景理解有着重要的作用。随着移动互联网技术的飞速发展,许多新的应用场景都需要利用这些文本信息,例如招牌识别和自动驾驶等。因此,自然场景文本的分析与处理也越来越成为计算机视觉领域的研究热点之一,该任务主要包括文本检测与识别。传统的文本检测和识别方法依赖于人工设计的特征和规则,且模型设计复杂、效率低、泛化性能差。随着深度学习的发展,自然场景文本检测、自然场景文本识别以及端到端的自然场景文本检测与识别都取得了突破性的进展,其性能和效率都得到了显著提高。本文介绍了该领域相关的研究背景,对基于深度学习的自然场景文本检测、识别以及端到端自然场景文本检测与识别的方法进行整理分类、归纳和总结,阐述了各类方法的基本思想和优缺点。并针对隶属于不同类别下的方法,进一步论述和分析这些主要模型的算法流程、适用场景和技术发展路线。此外,列举说明了部分主流公开数据集,对比了各个模型方法在代表性数据集上的性能情况。最后总结了目前不同场景数据下的自然场景文本检测、识别及端到端自然场景文本检测与识别算法的局限性以及未来的挑战和发展趋势。
With the rapid development of internet and mobile internet technologies
many new applications require extensive use of rich text information in natural scenarios
such as sign board recognition and automatic driving. Thus
the analysis and processing of scene text plays an essential role in this field and has increasingly become one of the research hotspots in the field of computer vision. Traditional text detection and recognition methods often rely on manually designed features
with large amount of computation and low efficiency. These methods also lack satisfactory generalization performance for complex scenes. With the development of deep learning in recent years
convolutional neural network has made great progress on scene text detection and recognition. These deep learning-based methods outperform traditional ones by a large margin and have already become the mainstream in the field of text reading in the wild. For scene text detection
the methods can be divided into two categories in accordance with the difference of target objects
as follows: top-down methods and bottom-up methods. Top-down methods mainly inherit the basic idea from general object detection or instance segmentation and directly regress the entire bounding box for the text instance. On the contrary
bottom-up methods
following the idea of traditional ones
initially detect some components of the text instance and then group them together through some rules. Bottom-up methods is more effective in processing text detection of arbitrary shapes and orientations than the top-down methods
and they are not as sensitive to text scaling as top-down methods. However
grouping the detected components into different text instances requires complex design and processing; thus
the inference stage of bottom-up approach becomes inefficient. These methods also encounter some difficulties when detecting long text. In addition
text conglutination occurs when detecting dense text. However
the top-down methods do not have this issue and can have a higher precision for text detection. In recent years
recognizing text in natural scenes (also known as scene text recognition (STR)) has aroused great interest in academia and industry. In particular
the objective of STR is to translate a cropped text instance image into a target string sequence. Although optical character recognition (OCR) in scanned documents has been well developed
STR remains challenging due to many factors (such as very complex backgrounds
various
fonts and imperfect imaging conditions). Early work has relied on hand-crafted features
such as histogram of oriented gradients descriptors
connected components
and stroke width transformation. However
the performance of these approaches is limited by the low capability of features. In recent years
with the increase and development of deep learning
the community has witnessed substantial advancements. In particular
scene text recognition approaches based on deep learning can be roughly divided into two branches
namely
segmentation-based approaches and segmentation-free approaches. Segmentation-based approaches attempt to locate the position of each character from the input text instance image
apply a character classifier to recognize each character
and then group characters into text lines to obtain the final recognition results. Segmentation-free approaches recognize the text instance image as a whole and focus on mapping the entire text instance image into a target string sequence directly. Both branches own their advantages and limitations. Therefore
practitioners should select the best trade-offs according to their needs under different application scenarios. In the previous decades
although the practicality and efficiency of recognition approaches have been significantly improved
future research is still required for generalization ability
evaluation protocols
and scenarios of STR. Finally
end-to-end scene text spotting aims to combine text detection and text recognition into a unified system
which can be optimized in a single pipeline. Bridging the gap between the detection branch and recognition branch is the most essential problem for the design of an end-to-end text spotting system. Similar to general object detection and instance segmentation
end-to-end text spotting methods can be divided into two categories
namely
two-stage methods and one-stage methods. Two-stage methods are mainly based on faster R-CNN(region convolutional neural network) and mask R-CNN
in which region of interest(RoI) pooling/align acts as a bridge between the two branches. However
these operations may lose some information given that the region proposals from region proposal network (RPN) are insufficiently accurate. One-stage methods follow the pipeline of detection then recognition. Various feature-align operations are carefully designed to boost the linking between detection and recognition branches. We sort out and summarize the detection and recognition methods of scene text
and further elaborate and analyze the basic ideas of various methods and their pros and cons. We aim to provide reference for researchers and help in future work.
Almazán J, Gordo A, Fornés A and Valveny E. 2014. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12): 2552-2566[DOI:10.1109/TPAMI.2014.2339814]
Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh S J and Lee H. 2019a. What is wrong with scene text recognition model comparisons? Dataset and model analysis//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4714-4722[ DOI: 10.1109/ICCV.2019.00481 http://dx.doi.org/10.1109/ICCV.2019.00481 ]
Baek Y, Lee B, Han D, Yun S and Lee H. 2019b. Character region awareness for text detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9365-9374[ DOI: 10.1109/CVPR.2019.00959 http://dx.doi.org/10.1109/CVPR.2019.00959 ]
Baek Y, Shin S, Baek J, Park S, Lee J, Nam D and Lee H. 2020. Character region attention for text spotting//Proceeding of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 504-521[ DOI: 10.1007/978-3-030-58526-6_30 http://dx.doi.org/10.1007/978-3-030-58526-6_30 ]
Bahdanau D, Cho K and Bengio Y. 2015. Neural machine translation by jointly learning to align and translate//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]
Bissacco A, Cummins M, Netzer Y and Neven H. 2013. PhotoOCR: Reading text in uncontrolled conditions//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 785-792[ DOI: 10.1109/ICCV.2013.102 http://dx.doi.org/10.1109/ICCV.2013.102 ]
Bookstein F L. 1989. Thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6): 567-585[DOI:10.1109/34.24792]
Breiman L. 2001. Random forests. Machine Learning, 45(1): 5-32[DOI:10.1023/A:1010933404324]
Busta M, Neumann L and Matas J. 2017. Deep textspotter: an end-to-end trainable scene text localization and recognition framework//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy, 2223-2231. [ DOI: 10.1109/ICCV.2017.242 http://dx.doi.org/10.1109/ICCV.2017.242 ].
Canny J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6): 679-698[DOI:10.1109/TPAMI.1986.4767851]
Casey R G and Lecolinet E. 1996. A survey of methods and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7): 690-706[DOI:10.1109/34.506792]
Chen X X, Wang T W, Zhu Y Z, Jin L W and Luo C J. 2020. Adaptive embedding gate for attention-based scene text recognition. Neurocomputing, 381: 261-271[DOI:10.1016/j.neucom.2019.11.049]
Cheng Z Z, Bai F, Xu Y L, Zheng G, Pu S L and Zhou S G. 2017. Focusing attention: towards accurate text recognition in natural images//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5086-5094[ DOI: 10.1109/ICCV.2017.543 http://dx.doi.org/10.1109/ICCV.2017.543 ]
Cheng Z Z, Xu Y L, Bai F, Niu Y, Pu S L and Zhou S G. 2018. AON: towards arbitrarily-oriented text recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5571-5579[ DOI: 10.1109/CVPR.2018.00584 http://dx.doi.org/10.1109/CVPR.2018.00584 ]
Ch'ng C K and Chan C S. 2017. Total-text: a comprehensive dataset for scene text detection and recognition//Proceedings of the 14th International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 935-942[ DOI: 10.1109/ICDAR.2017.157 http://dx.doi.org/10.1109/ICDAR.2017.157 ]
Chng C K, Liu Y L, Sun Y P, Ng C C, Luo C J, Ni Z H, Fang C M, Zhang S T, Han J Y, Ding E R, Liu J T, Karatzas D, Chan C S and Jin L W. 2019. ICDAR2019 robust reading challenge on arbitrary-shaped text-RRC-ArT//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 1571-1576[ DOI: 10.1109/ICDAR.2019.00252 http://dx.doi.org/10.1109/ICDAR.2019.00252 ]
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H and Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of 2014 Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics: 1724-1734[ DOI: 10.3115/v1/D14-1179 http://dx.doi.org/10.3115/v1/D14-1179 ]
Cong F Z, Hu W P, Huo Q and Guo L. 2019. A comparative study of attention-based encoder-decoder approaches to natural scene text recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 916-921[ DOI: 10.1109/ICDAR.2019.00151 http://dx.doi.org/10.1109/ICDAR.2019.00151 ]
Dai J F, Li Y, He K M and Sun J. 2016. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM: 379-387
Dai Y C, Huang Z, Gao Y T, Xu Y X, Chen K, Guo J and Qiu W D. 2018. Fused text segmentation networks for multi-oriented scene text detection//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 3604-3609[ DOI: 10.1109/ICPR.2018.8546066 http://dx.doi.org/10.1109/ICPR.2018.8546066 ]
Deng D, Liu H F, Li X L and Cai D. 2018. Pixellink: detecting scene text via instance segmentation//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). New Orleans, USA: AAAI: 6773-6780
Dollár P, Appel R, Belongie S and Perona P. 2014. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8): 1532-1545[DOI:10.1109/TPAMI.2014.2300479]
Dong C, Loy C C, He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307[DOI:10.1109/TPAMI.2015.2439281]
Epshtein B, Ofek E and Wexler Y. 2010. Detecting text in natural scenes with stroke width transform//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 2963-2970[ DOI: 10.1109/cvpr.2010.5540041 http://dx.doi.org/10.1109/cvpr.2010.5540041 ]
Fang S C, Xie H T, Chen J J, Tan J L and Zhang Y D. 2019. Learning to draw text in natural images with conditional adversarial networks//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI: 715-722[ DOI: 10.24963/ijcai.2019/101 http://dx.doi.org/10.24963/ijcai.2019/101 ]
Fang S C, Xie H T, Zha Z J, Sun N N, Tan J L and Zhang Y D. 2018. Attention and language ensemble for scene text recognition with convolutional sequence modeling//Proceedings of the 26th ACM International Conference on Multimedia. Seoul, Korea(South): ACM: 248-256[ DOI: 10.1145/3240508.3240571 http://dx.doi.org/10.1145/3240508.3240571 ]
Feng W, He W H, Yin F, Zhang X Y and Liu C L. 2019a. TextDragon: an end-to-end framework for arbitrary shaped text spotting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9076-9085[ DOI: 10.1109/ICCV.2019.00917 http://dx.doi.org/10.1109/ICCV.2019.00917 ]
Feng X J, Yao H X and Zhang S P. 2019b. Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity, 2019: #9345861[DOI:10.1155/2019/9345861]
Gao Y Z, Chen Y Y, Wang J Q, Tang M and Lu H Q. 2018. Dense chained attention network for scene text recognition//Proceedings of the 25th IEEE International Conference on Image Processing. Athens, Greece: IEEE: 679-683[ DOI: 10.1109/ICIP.2018.8451273 http://dx.doi.org/10.1109/ICIP.2018.8451273 ]
Gao Y Z, Chen Y Y, Wang J Q, Tang M and Lu H Q. 2019. Reading scene text with fully convolutional sequence modeling. Neurocomputing, 339: 161-170[DOI:10.1016/j.neucom.2019.01.094]
Goel V, Mishra A, Alahari K and Jawahar C V. 2013. Whole is greater than sum of parts: recognizing scene text words//Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE: 398-402[ DOI: 10.1109/ICDAR.2013.87 http://dx.doi.org/10.1109/ICDAR.2013.87 ]
Gomez R, Shi B G, Gomez L, Numann L, Veit A, Matas J, Belongie S and Karatzas D. 2017. ICDAR2017 robust reading challenge on COCO-text//Proceeding of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 1435-1443[ DOI: 10.1109/ICDAR.2017.234 http://dx.doi.org/10.1109/ICDAR.2017.234 ]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2672-2680
Goodfellow I J, Warde-Farley D, Mirza M, Courville A and Bengio Y. 2013. Maxout networks//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA: ACM: 1319-1327
Gordo A. 2015. Supervised mid-level features for word image representation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2956-2964[ DOI: 10.1109/CVPR.2015.7298914 http://dx.doi.org/10.1109/CVPR.2015.7298914 ]
Graves A. 2012. Supervised sequence labelling//Graves A, ed. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Heidelberg: Springer: 5-13[ DOI: 10.1007/978-3-642-24797-2_2 http://dx.doi.org/10.1007/978-3-642-24797-2_2 ]
Graves A and Jaitly N. 2014. Towards end-to-end speech recognition with recurrent neural networks//Proceedings of the 31st International Conference on Machine Learning. Bejing, China: JMLR: 1764-1772
Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H and Schmidhuber R. 2009. A novelconnectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5): 855-868[DOI:10.1109/TPAMI.2008.137]
Graves A, Fernández S, Gomez F and Schmidhuber J. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks//Proceedings of the 23rd international conference on Machine learning. Pittsburgh, USA: ACM: 369-376[ DOI: 10.1145/1143844.1143891 http://dx.doi.org/10.1145/1143844.1143891 ]
Graves A, Mohamed A R and Hinton G. 2013. Speech recognition with deep recurrent neural networks//Proceedings of 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada: IEEE: 6645-6649[ DOI: 10.1109/ICASSP.2013.6638947 http://dx.doi.org/10.1109/ICASSP.2013.6638947 ]
Graves A and Schmidhuber J. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5/6): 602-610[DOI:10.1016/j.neunet.2005.06.042]
Guo Q, Wang F L, Lei J, Tu D and Li G H. 2016. Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neurocomputing, 184: 78-90[DOI:10.1016/j.neucom.2015.07.135]
Gupta A, Vedaldi A and Zisserman A. 2016. Synthetic data for text localisation in natural images//Proceedings of 2016 IEEE conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2315-2324[ DOI: 10.1109/CVPR.2016.254 http://dx.doi.org/10.1109/CVPR.2016.254 ]
He K M, Gkioxari G, Dollár P and Girshick R. 2017a. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2961-2969[ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
He P, Huang W L, He T, Zhu Q L, Qiao Y and Li X L. 2017b. Single shot text detector with regional attention//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3047-3055[ DOI: 10.1109/ICCV.2017.331 http://dx.doi.org/10.1109/ICCV.2017.331 ]
He P, Huang W L, Qiao Y, Loy C C and Tang X O. 2016b. Reading scene text in deep convolutional sequences//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI: 3501-3508
He T, Huang W L, Qiao Y and Yao J. 2016c. Accurate text localization in natural image with cascaded convolutional text network[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1603.09423.pdf https://arxiv.org/pdf/1603.09423.pdf
He W H, Zhang X Y, Yin F and Liu C L. 2017c. Deep direct regression for multi-oriented scene text detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 745-753[ DOI: 10.1109/ICCV.2017.87 http://dx.doi.org/10.1109/ICCV.2017.87 ]
He X W, Yang Y, Shi B G and Bai X. 2019. VD-SAN: visual-densely semantic attention network for image caption generation. Neurocomputing, 328: 48-55[DOI:10.1016/j.neucom.2018.02.106]
He T, Tian Z, Huang W, Shen C, Qiao Y and Sun C. 2018. An end-to-end textspotter with explicit alignment and attention//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, 5020-5029[doi: 10.1109/CVPR.2018.00527 http://dx.doi.org/10.1109/CVPR.2018.00527 ].
Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780[DOI:10.1162/neco.1997.9.8.1735]
Hu H, Zhang C Q, Luo Y X, Wang Y Z, Han J Y and Ding E R. 2017. Wordsup: exploiting word annotations for character based text detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4940-4949[ DOI: 10.1109/ICCV.2017.529 http://dx.doi.org/10.1109/ICCV.2017.529 ]
Hu W Y, Cai X C, Hou J, Yi S and Lin Z P. 2020. GTC: guided training of ctc towards efficient and accurate scene text recognition//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI: 11005-11012
Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 4700-4708[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Huang L C, Yang Y, Deng Y F and Yu Y N. 2015. Densebox: unifying landmark localization with end to end object detection[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1509.04874.pdf https://arxiv.org/pdf/1509.04874.pdf
Huang W L, Lin Z, Yang J C and Wang J. 2013. Text localization in natural images using stroke feature transform and text covariance descriptors//Proceedings of 2013 International Conference on Computer Vision. Sydney, Australia: IEEE: 1241-1248[ DOI: 10.1109/ICCV.2013.157 http://dx.doi.org/10.1109/ICCV.2013.157 ]
Huang Y L, Sun Z H, Jin L W and Luo C J. 2020. EPAN: effective parts attention network for scene text recognition. Neurocomputing, 376: 202-213[DOI:10.1016/j.neucom.2019.10.010]
Jaderberg M, Simonyan K, Vedaldi A and Zisserman A. 2015a. Deep structured output learning for unconstrained text recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]
Jaderberg M, Simonyan K, Vedaldi A and Zisserman A. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1): 1-20[DOI:10.1007/s11263-015-0823-z]
Jaderberg M, Simonyan K, Zisserman A and Kavukcuoglu K. 2015b. Spatial transformer networks//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2017-2025
Jaderberg M, Vedaldi A and Zisserman A. 2014. Deep features for text spotting//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 512-528[ DOI: 10.1007/978-3-319-10593-2_34 http://dx.doi.org/10.1007/978-3-319-10593-2_34 ]
Jiang Y Y, Zhu X Y, Wang X B, Yang S L, Li W, Wang H, Fu P and Luo Z B. 2018. R 2 CNN: rotational region CNN for Arbitrarily-oriented scene text detection//Proceedings of the 24th International Conference on Pattern Recognition. Beijing, China: IEEE: 3610-3615[ DOI: 10.1109/ICPR.2018.8545598 http://dx.doi.org/10.1109/ICPR.2018.8545598 ].
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S and Valveny E. 2015. ICDAR 2015 competition on robust reading//Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE: 1156-1160[ DOI: 10.1109/ICDAR.2015.7333942 http://dx.doi.org/10.1109/ICDAR.2015.7333942 ]
Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G I, Mestre S R, Mas J, Mota D F, Almazàn J A and de las Heras L. 2013. ICDAR 2013 robust reading competition//Proceeding of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE: 1484-1493[ DOI: 10.1109/ICDAR.2013.221 http://dx.doi.org/10.1109/ICDAR.2013.221 ]
Kim K I, Jung K and Kim J H. 2003. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12): 1631-1639[DOI:10.1109/TPAMI.2003.1251157]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI:10.1109/5.726791]
Lee C Y and Osindero S. 2016. Recursive recurrent nets with attention modeling for OCR in the wild//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2231-2239[ DOI: 10.1109/CVPR.2016.245 http://dx.doi.org/10.1109/CVPR.2016.245 ]
Li H, Wang P and Shen C H. 2017a. Towards end-to-end text spotting with convolutional recurrent neural networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5238-5246[ DOI: 10.1109/ICCV.2017.560 http://dx.doi.org/10.1109/ICCV.2017.560 ]
Li H, Wang P, Shen C H and Zhang G Y. 2019. Show, attend and read: a simple and strong baseline for irregular text recognition//Proceedings ofthe 33rd Conference on Artificial Intelligence, AAAI 2019, the 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. Honolulu, USA: AAAI: 8610-8617[ DOI: 10.1609/aaai.v33i01.33018610 http://dx.doi.org/10.1609/aaai.v33i01.33018610 ]
Li Y, Qi H Z, Dai J F, Ji X Y and Wei Y C. 2017b. Fully convolutional instance-aware semantic segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2359-2367[ DOI: 10.1109/CVPR.2017.472 http://dx.doi.org/10.1109/CVPR.2017.472 ]
Liang M and Hu X L. 2015. Recurrent convolutional neural network for object recognition//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3367-3375[ DOI: 10.1109/CVPR.2015.7298958 http://dx.doi.org/10.1109/CVPR.2015.7298958 ]
Liao M H, Lyu P, He M H, Yao C, Wu W H and Bai X. 2021. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2): 532-548[DOI:10.1109/TPAMI.2019.2937086]
Liao M H, Pang G, Huang J, Hassner T and Bai X. 2020a. Mask TextSpotter v3: segmentation proposal network for robust scene text spotting//Proceeding of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 706-722[ DOI: 10.1007/978-3-030-58621-8_41 http://dx.doi.org/10.1007/978-3-030-58621-8_41 ]
Liao M H, Shi B G and Bai X. 2018a. TextBoxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8): 3676-3690[DOI:10.1109/TIP.2018.2825107]
Liao M H, Shi B G, Bai X, Wang X G and Liu W Y. 2017. TextBoxes: a fast text detector with a single deep neural network//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: ACM: 4161-4167
Liao M H, Wan Z Y, Yao C, Chen K and Bai X. 2020b. Real-time scene text detection with differentiable binarization//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI: 11474-11481
Liao M H, Zhu Z, Shi B G, Xia G S and Bai X. 2018b. Rotation-sensitive regression for oriented scene text detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5909-5918[ DOI: 10.1109/CVPR.2018.00619 http://dx.doi.org/10.1109/CVPR.2018.00619 ]
Litman R, Anschel O, Tsiper S, Litman R, Mazor S and Manmatha R. 2020. SCATTER: selective context attentional scene text recognizer//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11959-11969[ DOI: 10.1109/CVPR42600.2020.01198 http://dx.doi.org/10.1109/CVPR42600.2020.01198 ]
Liu C L, Koga M and Fujisawa H. 2002. Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(11): 1425-1437[DOI:10.1109/TPAMI.2002.1046151]
Liu H, Jin S and Zhang C S. 2018a. Connectionist temporal classification with maximum entropy regularization//Proceedings of 2018 Annual Conference on Neural Information Processing Systems. Montréal, Canada: [s. n.]: 831-841
Liu J C, Liu X B, Sheng J, Liang D, Li X and Liu Q J. 2019a. Pyramid mask text detector[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1903.11800.pdf https://arxiv.org/pdf/1903.11800.pdf
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016a. SSD: single shot multibox detector//Proceeding of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Spring: 21-37[ DOI: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2 ]
Liu W, Chen C F and Wong K Y K. 2018b. Char-Net: a character-aware neural network for distorted scene text recognition//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). New Orleans, USA: AAAI: 7154-7161
Liu W, Chen C F, Wong K Y K, Su Z Z and Han J Y. 2016b. STAR-Net: a spatial attention residue network for scene text recognition//Proceedings of 2016 British Machine Vision Conference. York, UK: BMVA[ DOI: 10.5244/C.30.43 http://dx.doi.org/10.5244/C.30.43 ]
Liu X B, Liang D, Yan S, Chen D G, Qiao Y and Yan J J. 2018c. FOTS: fast oriented text spotting with a unified network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5676-5685[ DOI: 10.1109/CVPR.2018.00595 http://dx.doi.org/10.1109/CVPR.2018.00595 ]
Liu X H, Kawanishi T, Wu X M and Kashino K. 2016c. Scene text recognition with CNN classifier and WFST-based word labeling//Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico: IEEE: 3999-4004[ DOI: 10.1109/ICPR.2016.7900259 http://dx.doi.org/10.1109/ICPR.2016.7900259 ]
Liu Y, Wang Z W, Jin H L and Wassell I. 2018d. Synthetically supervised feature learning for scene text recognition//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 449-465[ DOI: 10.1007/978-3-030-01228-1_27 http://dx.doi.org/10.1007/978-3-030-01228-1_27 ]
Liu Y L, Chen H, Shen C H, He T, Jin L W and Wang L W. 2020. ABCNet: real-timescene text spotting with adaptive bezier-curve network//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9809-9818[ DOI: 10.1109/CVPR42600.2020.00983 http://dx.doi.org/10.1109/CVPR42600.2020.00983 ]
Liu Y L and Jin L W. 2017. Deep matching prior network: toward tighter multi-oriented text detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1962-1969[ DOI: 10.1109/CVPR.2017.368 http://dx.doi.org/10.1109/CVPR.2017.368 ]
Liu Y L, Jin L W, Zhang S T, Luo C J and Zhang S. 2019b. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 90: 337-345[DOI:10.1016/j.patcog.2019.02.002]
Liu Y L, Zhang S, Jin L W, Xie L L, Wu Y Q and WangZ P. 2019c. Omnidirectional scene text detection with sequential-free box discretization//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: [s. n.]: 3052-3058[ DOI: 10.24963/ijcai.2019/423 http://dx.doi.org/10.24963/ijcai.2019/423 ]
Liu Z C, Li Y X, Ren F B, Goh W L and Yu H. 2018e. SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network//Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). New Orleans, USA: AAAI: 7194-7201
Liu Z C, Lin G S, Yang S, Liu F Y, Lin W S and Goh W L. 2019d. Towards robust curve text detection with conditional spatial expansion//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7269-7278[ DOI: 10.1109/CVPR.2019.00744 http://dx.doi.org/10.1109/CVPR.2019.00744 ]
Liu J M, Zhang C Q, Sun Y P, Han J Y and Ding E R. 2018f. Detecting text in the wild with deep character embedding network//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 501-517[ DOI: 10.1007/978-3-030-20870-7_31 http://dx.doi.org/10.1007/978-3-030-20870-7_31 ]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Long S B, Ruan J Q, Zhang W J, He X, Wu W H and Yao C. 2018. TextSnake: a flexible representation for detecting text of arbitrary shapes//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 19-35[ DOI: 10.1007/978-3-030-01216-8_2 http://dx.doi.org/10.1007/978-3-030-01216-8_2 ]
Luo C J, Jin L W and Sun Z H. 2019. MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recognition, 90: 109-118[DOI:10.1016/j.patcog.2019.01.020]
Luo C J, Lin Q X, Liu Y L, Jin L W and Shen CH. 2021. Separating content from style using adversarial learning for recognizing text in the wild. International Journal of Computer Vision[DOI:10.1007/s11263-020-01411-1]
Luo C J, Zhu Y Z, Jin L W and Wang Y P. 2020. Learn to Augment: joint data augmentation and network optimization for text recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13743-13752[ DOI: 10.1109/CVPR42600.2020.01376 http://dx.doi.org/10.1109/CVPR42600.2020.01376 ]
Lyu P, Liao M H, Yao C, Wu W H and Bai X. 2018b. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 71-88[ DOI: 10.1007/978-3-030-01264-9_5 http://dx.doi.org/10.1007/978-3-030-01264-9_5 ]
Lyu P, Yao C, Wu W H, Yan S C and Bai X. 2018a. Multi-oriented scene text detection via corner localization and region segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7553-7563[ DOI: 10.1109/CVPR.2018.00788 http://dx.doi.org/10.1109/CVPR.2018.00788 ]
Ma J Q, Shao W Y, Ye H, Wang L, Wang H, Zheng Y B and Xue X Y. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11): 3111-3122[DOI:10.1109/TMM.2018.2818020]
Matas J, Chum O, Urban M and Pajdla T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10): 761-767[DOI:10.1016/j.imavis.2004.02.006]
Minetto R, Thome N, Cord M, Leite N J and Stolfi J. 2013. T-HOG: an effective gradient-based descriptor for single line text regions. Pattern Recognition, 46(3): 1078-1090[DOI:10.1016/j.patcog.2012.10.009]
Miao Y, Gowayyed M, Metze F. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding//IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Scottsdale, USA: IEEE: 167-174[ DOI: 10.1109/ASRU.2015.7404790 http://dx.doi.org/10.1109/ASRU.2015.7404790 ]
Mishra A, Alahari K and Jawahar C V. 2012. Scene text recognition using higher order language priors//Proceedings of 2012 British Machine Vision Conference. Surrey, UK: BMVA Press: 1-11[ DOI: 10.5244/C.26.127 http://dx.doi.org/10.5244/C.26.127 ]
Mishra A, Alahari K and Jawahar C V. 2016. Enhancing energy minimization framework for scene text recognition with top-down cues. Computer Vision and Image Understanding, 145: 30-42[DOI:10.1016/j.cviu.2016.01.002]
Mou Y Q, Tan L, Yang H, Chen J Y, Liu L Y, Yan R and Huang Y H. 2020. PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 158-174[ DOI: 10.1007/978-3-030-58555-6_10 http://dx.doi.org/10.1007/978-3-030-58555-6_10 ]
Nayef N, Yin F, Bizid I, Choi H, Feng Y, KaratzasD, Luo Z B, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J C, Liu C L and Ogier J M. 2017. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE: 1454-1459[ DOI: 10.1109/ICDAR.2017.237 http://dx.doi.org/10.1109/ICDAR.2017.237 ]
Neumann L and Matas J. 2010. A method for text localization and recognition in real-world images//Proceedings of the 10th Asia Conference on Computer Vision. Queenstown, New Zealand: Springer: 770-783[ DOI: 10.1007/978-3-642-19318-7_60 http://dx.doi.org/10.1007/978-3-642-19318-7_60 ]
Neumann L and Matas J. 2012. Real-time scene text localization and recognition//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 3538-3545[ DOI: 10.1109/CVPR.2012.6248097 http://dx.doi.org/10.1109/CVPR.2012.6248097 ]
Phan T Q, Shivakumara P, Tian S X and Tan C L. 2013. Recognizing text with perspective distortion in natural scenes//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 569-576[ DOI: 10.1109/ICCV.2013.76 http://dx.doi.org/10.1109/ICCV.2013.76 ]
Qi X B, Chen Y H, Xiao R, Li C G, Zou Q and Cui S G. 2019. A novel joint character categorization and localization approach for character-level scene text recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition Workshops. Sydney, Australia: IEEE: 83-90[ DOI: 10.1109/ICDARW.2019.40086 http://dx.doi.org/10.1109/ICDARW.2019.40086 ]
Qiao L, Tang S L, Cheng Z Z, Xu Y L, Niu Y, Pu S L and Wu F. 2020a. Text perceptron: towards end-to-end arbitrary-shaped text spotting//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI: 11899-11907
Qiao Z, Zhou Y, Yang D B, Zhou Y C and Wang W P. 2020b. SEED: semantics enhanced encoder-decoder framework for scene text recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13525-13534[ DOI: 10.1109/CVPR42600.2020.01354 http://dx.doi.org/10.1109/CVPR42600.2020.01354 ]
Qin S Y, Bissaco A, Raptis M, Fujii Y and Xiao Y. 2019. Towards unconstrained end-to-end text spotting//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 4704-4714[ DOI: 10.1109/ICCV.2019.00480 http://dx.doi.org/10.1109/ICCV.2019.00480 ]
Redmon J, Divvala S, Girshick R and Farhadi A. 2016. You only look once: unified, real-time object detection//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 779-788[ DOI: 10.1109/CVPR.2016.91 http://dx.doi.org/10.1109/CVPR.2016.91 ]
Redmon J and Farhadi A. 2017. YOLO9000: better, faster, stronger//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6517-6525[ DOI: 10.1109/CVPR.2017.690 http://dx.doi.org/10.1109/CVPR.2017.690 ]
Ren S Q, He K M, Girshick R B and Sun J. 2015. Faster R-CNN: towards real-time object detection with region proposal networks//Proceedings of 2015 Annual Conference on Neural Information Processing Systems. Montreal, Canada: [s. n.]: 91-99
Risnumawan A, Shivakumara P, Chan C S and Tan C L. 2014. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications, 41(18): 8027-8048[DOI:10.1016/j.eswa.2014.07.008]
Rodriguez-Serrano J A, Gordo A and Perronnin F. 2015. Label embedding: a frugal baseline for text recognition. International Journal of Computer Vision, 113(3): 193-207[DOI:10.1007/s11263-014-0793-6]
Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer: 234-241[ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]
Sheng F F, Chen Z N and Xu B. 2019. NRTR: a no-recurrence sequence-to-sequence model for scene text recognition//Proceedings of 2019 International Conference on Document Analysis and Recognition. Sydney, Australia: IEEE: 781-786[ DOI: 10.1109/ICDAR.2019.00130 http://dx.doi.org/10.1109/ICDAR.2019.00130 ]
Shi B G, Bai X and Belongie S. 2017b. Detecting oriented text in natural images by linking segments//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2550-2558[ DOI: 10.1109/CVPR.2017.371 http://dx.doi.org/10.1109/CVPR.2017.371 ]
Shi B G, Bai X and Yao C. 2017a. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11): 2298-2304[DOI:10.1109/TPAMI.2016.2646371]
Shi B G, Wang X G, Lyu P, Yao C and Bai X. 2016. Robust scene text recognition with automatic rectification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4168-4176[ DOI: 10.1109/CVPR.2016.452 http://dx.doi.org/10.1109/CVPR.2016.452 ]
Shi B G, Yang M K, Wang X G, Lyu P, Yao C and Bai X. 2019. ASTER: an attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9): 2035-2048[DOI:10.1109/TPAMI.2018.2848939]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]
Su B L and Lu S J. 2014. Accurate scene text recognition based on recurrent neural network//Proceedings of the 12th Asian Conference on Computer Vision. Singapore, Singapore: Springer: 35-48[ DOI: 10.1007/978-3-319-16865-4_3 http://dx.doi.org/10.1007/978-3-319-16865-4_3 ]
Su B L and Lu S J. 2017. Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recognition, 63: 397-405[DOI:10.1016/j.patcog.2016.10.016]
Sun Y P, Liu J M, Liu W, Han J Y, Ding E R and Liu J T. 2019. Chinese street view text: large-scale Chinese text reading with partially supervised learning//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9086-9095[ DOI: 10.1109/ICCV.2019.00918 http://dx.doi.org/10.1109/ICCV.2019.00918 ]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI: 10.1109/cvpr.2015.7298594 http://dx.doi.org/10.1109/cvpr.2015.7298594 ]
Tang J, Yang Z B, Wang Y P, Zheng Q, Xu Y C and Bai X. 2019. Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96: #106954[DOI:10.1016/j.patcog.2019.06.020]
Tian S X, Lu S J and Li C S. 2017. WeText: scene text detection under weak supervision//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1492-1500[ DOI: 10.1109/ICCV.2017.166 http://dx.doi.org/10.1109/ICCV.2017.166 ]
Tian Z, Huang W L, He T, He P and Qiao Y. 2016. Detecting text in natural image with connectionist text proposal network//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer: 56-72[ DOI: 10.1007/978-3-319-46484-8_4 http://dx.doi.org/10.1007/978-3-319-46484-8_4 ]
Tian Z T, Shu M, Lyu P, Li R Y, Zhou C, Shen X Y and Jia J Y. 2019. Learning shape-aware embedding for scene text detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 4229-4238[ DOI: 10.1109/CVPR.2019.00436 http://dx.doi.org/10.1109/CVPR.2019.00436 ]
Vaswani A, Shazeer N, Parmar N, Uszkoreit N, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 5998-6008
Wan Z Y, Xie F M, Liu Y B, Bai X and Yao C. 2019. 2D-CTC for scene text recognition[EB/OL]. [2021-01-21] . https: //arxiv.org/pdf/1907.09705.pdf https: //arxiv.org/pdf/1907.09705.pdf
Wan Z, He M, Chen H, Bai X and Yao C. 2020. Textscanner: reading characters in order for robust scene text recognition//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI
Wang C, Yin F and Liu C L. 2018a. Memory-augmented attention model for scene text recognition//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition. Niagara Falls, USA: IEEE: 62-67[ DOI: 10.1109/ICFHR-2018.2018.00020 http://dx.doi.org/10.1109/ICFHR-2018.2018.00020 ]
Wang F F, Zhao L M, Li X, Wang X C and Tao D C. 2018b. Geometry-aware scene text detection with instance transformation network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1381-1389[ DOI: 10.1109/CVPR.2018.00150 http://dx.doi.org/10.1109/CVPR.2018.00150 ]
Wang H, Lu P, Zhang H, Yang M K, Bai X, Xu Y C, He M C, Wang Y P and Liu W Y. 2020a. All you need is boundary: Toward arbitrary-shaped text spotting//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI: 12160-12167
Wang J F and Hu X L. 2017. Gated recurrent convolution neural network for OCR//Proceedings of 2017 Annual Conference on Neural Information Processing Systems. Long Beach, USA: [s. n.]: 335-344
Wang K, Babenko B and Belongie S. 2011. End-to-end scene text recognition//Proceedings of 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE: 1457-1464[ DOI: 10.1109/ICCV.2011.6126402 http://dx.doi.org/10.1109/ICCV.2011.6126402 ]
Wang P, Yang L, Li H, Deng Y Y, Shen C H and Zhang Y N. 2019b. A simple and robust convolutional-attention network for irregular text recognition[EB/OL]. [2021-01-21] . https://deepai.org/publication/a-simple-and-robust-convolutional-attention-network-for-irregular-text-recognition https://deepai.org/publication/a-simple-and-robust-convolutional-attention-network-for-irregular-text-recognition
Wang P F, Zhang C G, Qi F, Huang Z M, En M Y, Han J Y, Liu J T, Ding E R and Shi G M. 2019a. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning//Proceedings of the 27th ACM International Conference on Multimedia. Nice, France: ACM: 1277-1285[ DOI: 10.1145/3343031.3350988 http://dx.doi.org/10.1145/3343031.3350988 ]
Wang Q, Liu S T, Chanussot J and Li X L. 2019d. Scene classification with recurrent attention of VHR remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 57(2): 1155-1167[DOI:10.1109/TGRS.2018.2864987]
Wang Q Q, Jia W J, He X J, Lu Y, Blumenstein M, Huang Y and Lyu S. 2019c. ReELFA: a scene text recognizer with encoded location and focused attention//Proceedings of 2019 International Conference on Document Analysis and Recognition Workshops. Sydney, Australia: IEEE: 71-76[ DOI: 10.1109/ICDARW.2019.40084 http://dx.doi.org/10.1109/ICDARW.2019.40084 ]
Wang S W, Wang Y T, Qin X R, Zhao Q J and Tang Z. 2019e. Scene text recognition via gated cascade attention//Proceedings of 2019 IEEE International Conference on Multimedia and Expo. Shanghai, China: IEEE: 1018-1023[ DOI: 10.1109/ICME.2019.00179 http://dx.doi.org/10.1109/ICME.2019.00179 ]
Wang T, Wu D J, Coates A and Ng A Y. 2012. End-to-end text recognition with convolutional neural networks//Proceedings of the 21st International Conference on Pattern Recognition. Tsukuba, Japan: IEEE: 3304-3308
Wang T W, Zhu Y Z, Jin L W, Luo C J, Chen X X, Wu Y Q, Wang Q Y and Cai M X. 2020b. Decoupled attention network for text recognition//Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, USA: AAAI
Wang W H, Xie E Z, Li X, Hou W B, Lu T, Yu G and Shao S. 2019f. Shape robust text detection with progressive scale expansion network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 9336-9345[ DOI: 10.1109/CVPR.2019.00956 http://dx.doi.org/10.1109/CVPR.2019.00956 ]
Wang W H, Xie E Z, Song X G, Zang Y H, Wang W J, Lu T, Yu G and Shen C H. 2019g. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): 8440-8449[ DOI: 10.1109/ICCV.2019.00853 http://dx.doi.org/10.1109/ICCV.2019.00853 ]
Wang W J, Xie E Z, Liu X B, Wang W H, Liang D, Shen C H and Bai X. 2020c. Scene text image super-resolution in the wild//Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK: Springer: 650-666[ DOI: 10.1007/978-3-030-58607-2_38 http://dx.doi.org/10.1007/978-3-030-58607-2_38 ]
Wang X B, Jiang Y Y, Luo Z B, Liu C L, Choi H and Kim S. 2019h. Arbitrary shape scene text detection with adaptive text region representation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6449-6458[ DOI: 10.1109/CVPR.2019.00661 http://dx.doi.org/10.1109/CVPR.2019.00661 ]
Wang Y X, Xie H T, Zha Z J, Xing M T, Fu Z L and Zhang Y D. 2020d. ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11753-11762[ DOI: 10.1109/CVPR42600.2020.01177 http://dx.doi.org/10.1109/CVPR42600.2020.01177 ]
Wu L, Zhang C Q, Liu J M, Han J Y, Liu J T, Ding E R and Bai X. 2019. Editing text in the wild//Proceedings of ACM International Conference on Multimedia. Nice, France: ACM: 1500-1508[ DOI: 10.1145/3343031.3350929 http://dx.doi.org/10.1145/3343031.3350929 ]
Wu Y and Natarajan P. 2017. Self-organized text detection with minimal post-processing via border learning//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5000-5009[ DOI: 10.1109/ICCV.2017.535 http://dx.doi.org/10.1109/ICCV.2017.535 ]
Xiao S Y, Peng L R, Yan R J, An K Y, Yao G and Min J. 2020. Sequential deformation for accurate scene text detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 108-124[ DOI: 10.1007/978-3-030-58526-6_7 http://dx.doi.org/10.1007/978-3-030-58526-6_7 ]
Xie E Z, Zang Y H, Shao S, Yu G, Yao C and Li G Y. 2019a. Scene text detection with supervised pyramid context network//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, the 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019, the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. Honolulu, USA: AAAI: 9038-9045
Xie H T, Fang S C, Zha Z J, Yang Y T, Li Y and Zhang Y D. 2019b. Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(1S): #3[DOI:10.1145/3231737]
Xie Z C, Huang Y X, Zhu Y Z, Jin L W, Liu Y L and Xie L L. 2019c. Aggregation cross-entropy for sequence recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 6531-6540[ DOI: 10.1109/CVPR.2019.00670 http://dx.doi.org/10.1109/CVPR.2019.00670 ]
Xing L J, Tian Z, Huang W L and Scott M. 2019. Convolutional character networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9126-9136[ DOI: 10.1109/ICCV.2019.00922 http://dx.doi.org/10.1109/ICCV.2019.00922 ]
Xu Y C, Wang Y K, Zhou W, Wang Y P, Yang Z B and Bai X. 2019. Textfield: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 28(11): 5566-5579[DOI:10.1109/TIP.2019.2900589]
Xue C H, Lu S J and Zhan F N. 2018. Accurate scene text detection through border semantics awareness and bootstrapping//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 355-372[ DOI: 10.1007/978-3-030-01270-0_22 http://dx.doi.org/10.1007/978-3-030-01270-0_22 ]
Xue C H, Lu S J and Zhang W. 2019. MSR: multi-scale shape regression for scene text detection//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, Chian: [s. n.]: 989-995
Yang M K, Guan Y S, Liao M H, He X, Bian K G, Bai S, Yao C and Bai X. 2019. Symmetry-constrained rectification network for scene text recognition//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9147-9156[ DOI: 10.1109/ICCV.2019.00924 http://dx.doi.org/10.1109/ICCV.2019.00924 ]
Yang Q P, Cheng M L, Zhou W M, Chen Y, Qiu M H and Lin W. 2018. Inceptext: a new inception-text module with deformable psroi pooling for multi-oriented scene text detection//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: ACM: 1071-1077
Yang X, He D F, Zhou Z H, Kifer D and Giles C L. 2017. Learning to read irregular text with attention mechanisms//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: ACM: 3280-3286
Yao C, Bai X and Liu W Y. 2014a. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 23(11): 4737-4749[DOI:10.1109/TIP.2014.2353813]
Yao C, Bai X, Liu W Y, Ma Y and Tu Z W. 2012. Detecting texts of arbitrary orientations in natural images//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 1083-1090[ DOI: 10.1109/CVPR.2012.6247787 http://dx.doi.org/10.1109/CVPR.2012.6247787 ]
Yao C, Bai X, Sang N, Zhou X Y, Zhou S C and Cao Z M. 2016. Scene text detection via holistic, multi-channel prediction[EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1606.09002.pdf https://arxiv.org/pdf/1606.09002.pdf
Yao C, Bai X, Shi B G and Liu W Y. 2014b. Strokelets: a learned multi-scale representation for scene text recognition//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 4042-4049[ DOI: 10.1109/CVPR.2014.515 http://dx.doi.org/10.1109/CVPR.2014.515 ]
Yin F, Wu Y C, Zhang X Y and Liu C L. 2017. Scene text recognition with sliding convolutional character models[EB/OL]. [2021-01-06] . https://arxiv.org/pdf/1709.07727.pdf https://arxiv.org/pdf/1709.07727.pdf
Yu D L, Li X, Zhang C Q, Liu T, Han J Y, Liu J T and Ding E R. 2020. Towards accurate scene text recognition with semantic reasoning networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12110-12119[ DOI: 10.1109/CVPR42600.2020.01213 http://dx.doi.org/10.1109/CVPR42600.2020.01213 ]
Yue X Y, Kuang Z H, Lin C H, Sun H B and Zhang W. 2020. RobustScanner: dynamically enhancing positional clues for robust text recognition//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 135-151[ DOI: 10.1007/978-3-030-58529-7_9 http://dx.doi.org/10.1007/978-3-030-58529-7_9 ]
Zhan F N and Lu S J. 2019. ESIR: end-to-end scene text recognition via iterative image rectification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2059-2068[ DOI: 10.1109/CVPR.2019.00216 http://dx.doi.org/10.1109/CVPR.2019.00216 ]
Zhan F N, Lu S J and Xue C H. 2018. Verisimilar image synthesis for accurate detection and recognition of texts in scenes//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 249-266[ DOI: 10.1007/978-3-030-01237-3_16 http://dx.doi.org/10.1007/978-3-030-01237-3_16 ]
Zhan F N, Zhu HY and Lu S J. 2019. Spatial fusion GAN for image synthesis//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3653-3662[ DOI: 10.1109/CVPR.2019.00377 http://dx.doi.org/10.1109/CVPR.2019.00377 ]
Zhang C H, Gupta A and Zisserman A. 2020a. Adaptive text recognition through visual matching//Proceeding of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 51-67[ DOI: 10.1007/978-3-030-58517-4_4 http://dx.doi.org/10.1007/978-3-030-58517-4_4 ]
Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R and Ding X H. 2019a. Look more than once: an accurate detector for text of arbitrary shapes//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10552-10561[ DOI: 10.1109/CVPR.2019.01080 http://dx.doi.org/10.1109/CVPR.2019.01080 ]
Zhang H, Yao Q M, Yang M K, Xu Y C and Bai X. 2020b. AutoSTR: efficient backbone search for scene text recognition//Proceeding of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 751-67[ DOI: 10.1007/978-3-030-58586-0_44 http://dx.doi.org/10.1007/978-3-030-58586-0_44 ]
Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F and Yin X C. 2020c. Deep relational reasoning graph network for arbitrary shape text detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 9699-9708[ DOI: 10.1109/CVPR42600.2020.00972 http://dx.doi.org/10.1109/CVPR42600.2020.00972 ]
Zhang Y P, Nie S, Liu W J, Xu X, Zhang D X and Shen H T. 2019b. Sequence-to-sequence domain adaptation network for robust text image recognition//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2735-2744[ DOI: 10.1109/CVPR.2019.00285 http://dx.doi.org/10.1109/CVPR.2019.00285 ]
Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y and Bai X. 2016. Multi-oriented text detection with fully convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4159-4167[ DOI: 10.1109/CVPR.2016.451 http://dx.doi.org/10.1109/CVPR.2016.451 ]
Zhong Y, Karu K and Jain A K. 1995. Locating text in complex color images. Pattern Recognition, 28(10): 1523-1535[DOI:10.1016/0031-3203(95)00030-4]
Zhong Z Y, Jin L W and Huang S P. 2017. DeepText: a new approach for text proposal generation and text detection in natural images//Proceedings of 2017 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP). New Orleans, USA: IEEE: 1208-1212[ DOI: 10.1109/ICASSP.2017.7952348 http://dx.doi.org/10.1109/ICASSP.2017.7952348 ]
Zhong Z Y, Sun L and Huo Q. 2019. An anchor-free region proposal network for Faster R-CNN-based text detection approaches. International Journal on Document Analysis and Recognition, 22(3): 315-327[DOI:10.1007/s10032-019-00335-y]
Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R and Liang J J. 2017. EAST: an efficient and accurate scene text detector//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5551-5560[ DOI: 10.1109/CVPR.2017.283 http://dx.doi.org/10.1109/CVPR.2017.283 ]
Zhu Y W, Wang S L, Huang Z and Chen K. 2019. Text recognition in images based on transformer with hierarchical attention//Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE: 1945-1949[ DOI: 10.1109/ICIP.2019.8803203 http://dx.doi.org/10.1109/ICIP.2019.8803203 ]
Zhu Y X and Du J. 2021. TextMountain: accurate scene text detection via instance segmentation. Pattern Recognition, 110: #107336[DOI:10.1016/j.patcog.2020.107336]
Zitnick C L and Dollar P. 2014. Edge boxes: locating object proposals from edges//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 391-405[ DOI: 10.1007/978-3-319-10602-1_26 http://dx.doi.org/10.1007/978-3-319-10602-1_26 ]
相关文章
相关作者
相关机构
京公网安备11010802024621