Caption text presented in the video plays an important role in video retrieval and browser as it provides highly condensed information about the contents of the video. The caption text only can be used after it is extracted from video. According to Morrone's phase congruency theory
image features such as edges
shadows and bars always occur at points of maximum phase congruency
and the maxima of local energy occur at points of maximum phase congruency. Based on this theory
a video caption text segmentation approach is presented in this paper. Instead of Morrone's approach in calculating of local energy
we extended Morrone's approach through constructing a quadrature pair filter from a biorthogonal wavelet by Hilbert transform. The local energy is then calculated from the multiresolution decomposed image in octave bands. According the relationship between local energy and caption text's edge features
we then segment the image using local energy projection. This is the first step of caption segmentation. After that
colour segmentation can be applied to the first segmentation results. The experiments in this paper show that this approach can achieve good caption region segmentation results.