Li Qinrui, Lyu Xueqiang, Li Zhuo, Liu Kun. Logistic model for video caption enhancement[J]. Journal of Image and Graphics, 2014, 19(5): 683-692. DOI: 10.11834/jig.20140505.
Video caption contains abundant information related to the video content. Recognizing text in images is the premise of making full use of this information. Although optical character recogonition (OCR) software recognition accuracy has been improved
the video captions with complex backgrounds still cannot be recognized well. Therefore
in order to improve the recognition accuracy
the extracted caption shall be enhanced which can reduce the complexity of caption background and improve the contrast between background and text. In this paper
we propose a method of fusing multi-frame information to realize caption enhancement based on the Logistic model. Logistic curve is a common form of an S-type curve
which either end or converge to a constant. By counting and analyzing distribution proportion of different pixel values in a single background caption
we establish a proper Logistic model whose output can be used as the enhanced caption's pixel values and their distribution proportion shall generally be kept consistent with the single background caption. According to the convergence of the Logistic model
the majority of pixel values can be assigned to 0 or 255
and a small quantity of gray points can be taken as transitions of black points and white points. Therefore
the enhanced caption image not only keeps the continuity of the pixel values but also improves the contrast between background and text. Then we detect and track the video caption
and align the same segments of caption
which appears in consecutive frames to obtain multi-frame information of the pixel. In order to reduce the complexity of the background
we analyze the characteristics of the background changing in the time domain as well as
the inherent characteristics of the background and text. Furthermore
we take the fusion of them as the characteristics of Logistic model. Normalizing the characteristic of the model based on caption blocks which is the unit of enhancement
we take the result as the input parameter of the Logistic model. We select 60 videos with caption from the Paike column of Youku and divide caption into three categories:the special complex background caption containing shadows or stroke effects
the common complex background caption
and the single background caption. We respectively implement four caption enhancement methods: OTSU with adaptive threshold method based on single frame
multiple frame averaging method
minimum pixel value search method and the method proposed in this paper
we use these four methods for each kind of caption for our caption enhancement experiment. We use Hanwang OCR to recognize the enhanced caption and take the recognition accuracy as the evaluation instance of the caption enhancement effect. Experimental results show that the recognition accuracy of the three kinds of caption are 81.76%
97.13%
and 81.76% respectively after enhanced by adopting the method in this paper. Comparing with the best results of the other three methods
the accuracy respectively increased by 24.35%
2.70% and 2.70%. Thus
the method in this paper can adapt to both complex background and single background caption. Especially
the enhanced effect of complex background caption containing shadows or stoke show a significant improvement. In this paper
we propose a method of fusing multi-frame information to realize caption enhancement based on the Logistic model. This method can reduce the complexity of the caption background and improve the contrast between the background and the text as well. Furthermore
the enhanced caption can be recognized well by OCR software. However
the parameters of the Logistic model are static values acquired by artificial parameter adjustment. If we can dynamically adjust parameters according to the characteristics of different video caption
the recognition accuracy will be further improved.