摘要:This paper summarized the significant methods and techniques for video abstraction and systematically presented all aspects of this technology to properly evaluate the research progress of video abstraction and to provide a correct orientation for deep research in this field. This paper introduced the main research techniques in detail and discussed the typical algorithms based on two main abstraction generation steps, namely, video content analysis and abstraction generation. Specifically, this paper analyzed the progress of video abstraction in the past five years and described the new demand in this field, that is, real-time and multi-view video abstraction. The evaluation benchmark of video abstraction was also analyzed. This paper summarized the video abstraction techniques and described the current research situation in this field. Two solutions to address the difficulty in obtaining video semantics were also proposed. Finally, the paper discussed the future development direction of video summary technology. As an important part of understanding video content, video abstraction has great research value. To date, video abstraction still has defects that need to be addressed, particularly in the areas of video semantic presentation and abstraction evaluation benchmark.
关键词:video content analysis;abstraction generation;real-time video abstraction;multi-view video abstraction;video semantic acquisition
摘要:Local Binary Pattern (LBP) is a theoretically simple yet highly efficient texture descriptor. LBP has recently attracted increasing attention and has been successfully applied in image analysis, computer vision, and pattern recognition because of its discriminative power and computational simplicity. LBP has been developed for the traditional pattern recognition problems of texture classification and face recognition. Considering the theoretical and practical values of LBP, this study comprehensively reviews the suitability of various LBP variants in texture classification and face recognition. The fundamentals of the traditional LBP and various LBP variants are reviewed, and the advantages and disadvantages of various LBP variants are discussed by dividing them into categories under a novel systematic framework. Implicit directions for future LBP studies are also presented. First, the fundamentals of the traditional LBP method and various LBP variants are reviewed in detail and the pros and cons of various LBP variants are discussed by dividing them into different categories under a novel systematic framework. Second, as two typical and most successful applications of the LBP approach, LBP-based texture classification and LBP-based face recognition are reviewed. Finally, the implicit directions for future LBP research are presented. LBP method continues to be a hot research topic in the field of computer vision and pattern recognition, which is evidenced by the fact that new low storage and fast local binary descriptors and new applications of LBP are still emerging.
摘要:Traditional image interpolation methods separate an image into many sub-blocks. In this case, the quality of the recovered image becomes poor, particularly in the boundary of the blocks and the image borders. To eliminate this side effect, we present a new algorithm based on the 2D all-phase digital signal processing method. First, an algorithm for 2D all-phase windowed processing is designed to achieve three kinds of interpolation kernel based on DFT, DCT, and DWT. The three kernels, which are built directly from the 2D transmission property , can process the frequency distribution along the row and column as well as along the diagonal slope directions. Both and the base window should be confined for phase linearity. The proposed algorithm effectively eliminates the blocking effects and solves the uncontrollable contents in the diagonal slope directions using a traditional technique. In addition, the proposed algorithm improves the PSNR of the recovered images by at least 3 dB. Comparison experiments on the proposed algorithm and six traditional methods show that the proposed algorithm is physically explicit and applicable.
摘要:The Hessian matrix comprises fidelity and regularization terms. Without a special structure, the inverse Hessian matrix is difficult and expensive to compute. To overcome these shortcomings, a Newton projection iterative algorithm with a block diagonal Hessian matrix is proposed. The fidelity term is described by the L norm. The regularization term is established by considering the bounded variation function as a variable of the compound function. The energy function of the regularization model is established. The regularization model is converted into an augmentation energy function by using the potential function. Constructing preconditioned matrix makes the Hessian matrix diagonal and easy to compute. A retrospective linear search algorithm and an improved Barzilai-Borwein step length update criterion are adopted for complete convergence to prevent the Newton projection iterative algorithm from trapping the local optimal solution. For the problem of image deblurring regularization models easily smoothing edges and producing the stair effect, a new image deblurring model and Newton projection iterative algorithm are proposed. The simulation shows that the problem is fairly solved by the proposed method. Compared with other regularization image deblurring models, the proposed model exhibits better image improvement, such as effectively protecting image edges, alleviating the stair effect, and achieving lower relative error and deviation as well as higher peak signal-to-noise ratio and structural similarity index measure.
摘要:To solve focus detection and multi-focus problems, which are challenging issues in image retargeting based on fisheye transformation technology, a new image retargeting method that uses an improved fisheye transformation technique is proposed in this study. The proposed method calculates seams with high energy on the basis of source image energy. All high-energy seams are found from the high-energy part, which is the focus area of the source image. This source image is retargeted to the target image through fisheye transformation on the basis of seams instead of image regions. The transformation mode of the fisheye warping method is changed and applied to image retargeting. Experimental results show that our method addresses the problems of image retargeting based on fisheye transformation technology. The target images obtained through our method exhibit good visual effect. Users' subjective satisfaction score is about 4. The algorithm runs fast when the length of the target image is half of the source image (512×384). The running time is 6 s. In this study, we propose a new method based on fisheye transformation technology for image retargeting. The method preserves the advantage of fisheye warping in retaining the focused areas of the image without discarding unfocused contents. It also uses an optimal high-energy seam finding scheme to address the problems on focus detection and multi-focus conflict. The visual effect of target images and the subjective satisfaction of users indicate that the proposed approach is a viable solution for image retargeting based on fisheye warping.
摘要:Considering that the variation methods for image multiplicative noise removal exhibit the staircase effect problem, we analyze the characteristic and correlation of several classical multiplicative denoising variation models. With the frequency characteristic of the fractional differential considered, a fractional-order convex variation model for multiplicative Gamma noise removal is proposed. Our fractional-order convex variation model is the fractional-order generalization of the classical I-divergence variation model. Based on duality theory, a fractional-order primal-dual algorithm to solve the model is proposed. The range of the parameter is given according to saddle-point theory to guarantee algorithm convergence. In terms of frequency domain aspects, the experiments verify that the proposed fractional-order variation model is effective in relaxing the staircase effect and preserving medium-frequency texture information in a cardiac ultrasound image, as well as high-frequency building edges information in the “Cameraman” image compared with the classical first-order variation model. The proposed fractional-order primal-dual algorithm can also effectively converge and exhibits a fast convergence speed. To produce denoised images with minimal loss of image details, this study proposes a fractional-order variation model for image multiplicative noise removal. Classical variation numerical algorithms need to compute the derivative of a non-differentiable function (i.e., the fractional-order regulation term), so we refer to a primal-dual algorithm based on a resolvent for an alternative solution. Experiment results indicate that the proposed model can effectively improve the image visual effect, and the adoptive numerical algorithm demonstrates a fast convergence speed.
关键词:multiplicative noise;variation method;fractional-order differential;primal-dual algorithm;saddle-point model
摘要:Traditional fast-mode decision schemes are commonly designed on the basis of feature analysis of the source videos. However, the rate-distortion behavior of the coding modes changes when channel errors are involved. In this study, we propose a fast-mode decision scheme for H.264 video coding to address the requirements of low complexity and error resilience in real-time video communications. We examine the end-to-end rate-distortion (R-D) behavior of various coding modes and derive a hierarchical mode decision scheme. We propose a fast estimation of the coding costs of skip and intra modes in a packet-loss environment and narrow the mode decision into a non-intra or non-skip path. The proposed algorithm is implemented in H.264 reference software JM12. The classical error-resilient video coding algorithm that conducts an exhaustive search of all coding modes is used as reference. By efficiently skipping a number of motion estimations and intra prediction mode decisions, the algorithm saves time by approximately 50% without deterioration in R-D performance. Experiment results show that the proposed fast-mode decision algorithm based on the estimation of end-to-end R-D cost can achieve significant time saving and retain a good image quality.
摘要:Using Cohen-Or method to harmonize a color-rich image leads to severe alterations in the original colors of an image and thus creates a processed image that is different from the original one. Furthermore, deciding on the type of optimal color harmonization template is time consuming. This study introduces skewness and kurtosis, which are used to describe all variant values in the form of deviation and steep degree, to calculate the scope of the value H of a harmonization template gray area and categorize seven harmonization templates into X and T types. According to the gray area loss of at least the principle of the optimal template, the result of value H is applied to the gray area in the X or T harmonization template. Normalized Gaussian function is commonly used to migrate color into the gray area of the adjusted color harmonization template. Harmonization of the original image is then achieved. Experimental results show that the harmonized image using the proposed method exhibits great similarity to the original one, retains the original colors, and considerably reduces the time needed in the process of harmonization. The harmonious processing time of a 450×423 image is reduced from 2 549.78 s to 13.869 3 s. The harmonized image using the proposed method retains its color style similarity to the original image. The proposed method facilitates grayscale image colorization, image color transfer, and industrial product style transfer.
关键词:image harmonization;skewness;kurtosis;color transfer
摘要:Blur evaluation is an image quality assessment process used to estimate the perceived sharpness or blurriness of images that imaging systems or processing algorithms output. Blur evaluation has numerous uses in practical applications and plays a central role in shaping image acquisition, transmission, analysis algorithms and systems, as well as the implementation, optimization, and testing of these algorithms and systems. An image sparse representation and probabilistic latent semantic analysis (pLSA)-based blur evaluation method is proposed to incorporate the unsupervised learning characteristic and the hierarchical feature abstraction model of the human visual system to the blur evaluation process. The proposed method is based on the hypothesis that images possess latent characteristics that can be used to measure image quality and the fact that the human brain can learn in an unsupervised manner. The pLSA model is used to identify meaningful topics that are latent in the sparse codes of natural images and the test image. The similarity of the latent topics between training images and the test image is used to measure blurriness. The proposed method has three crucial stages, which are the dictionary construction, learning, and blur metric computation stages. The dictionary construction stage develops the dictionary from clear sample images. The learning stage extracts awerage topics from the sparse codes of clear training images by using pLSA. The blur metric computation stage calculates the blur metric according to the correlation coefficient between the latent topics of the test image and the awerage topics of the clear training images. Experimental results on the synthetic images and public image quality databases show that the proposed method exhibits better performance than the state-of-the-art blur metrics in terms of monotonicity, anti-noise capability, and the suggested evaluation metrics of the Video Quality Experts Group. The Pearson correlation and Spearman rank-order correlation coefficients are approximately 0.995 6 and 0.993 4, respectively. Based on the unsupervised learning characteristic and the hierarchical feature abstraction model of the human visual system, this study proposes a novel blur evaluation method. This method uses the similarity of latent topics between training images and the test image to measure blurriness. Experimental results show that the proposed blur evaluation method can evaluate the amount of blurriness in images with high accuracy, and it correlates well with the human visual system.
摘要:To improve performance in the state identification of boiler combustion flame images, a state identification method based on the Log-Gabor wavelet and kernel principal component analysis (KPCA) with fractional power polynomial models is proposed. Flame images are filtered by the log-Gabor filter bank. The texture feature vectors of the images are constructed with the use of the mean and standard deviation of the filtered image. KPCA with fractional power polynomial models is utilized to reduce the dimension of the texture feature vectors. These dimension-reduced texture feature vectors are classified by a support vector machine. Experiment results show that the proposed method can accurately extract the texture features of the flame images. Compared with the feature extraction method based on the Log-Gabor wavelet and two other feature extraction methods based on the Gabor wavelet, the proposed method has a higher classification rate of 76%. The variance proportion of the first principal component increases as the kernel parameter increases. A state-identification method of boiler combustion flame images is proposed in this study. High classification accuracy can be achieved through a reduction in the dimension of the texture feature vectors of flame images. Experiment results show that the proposed method can obtain high classification accuracy. It also exhibits a short running time and good real-time performance.
关键词:combustion monitoring;flame image;Log-Gabor wavelet;kernel principal component analysis;support vector machine
摘要:To achieve an abundant texture, a new weighted local binary pattern (W-LBP) is proposed in this study. An image is divided into several sub-images. W-LBP texture histograms are extracted from each sub-image. The proposed algorithm adaptively weights the W-LBP histograms of sub-patches on the basis of their information entropy and serially connects all histograms to create a final texture feature. The process of extracting the W-LBP texture of each sub-image is as follows. First, a local neighborhood is constructed with a certain centered pixel and surrounding eight pixels. A string of binary codes with six bits is then obtained by comparison of three pairs of horizontal and vertical pixels within the local neighborhood with an adaptive threshold. The texture feature of the local region is the sum obtained from the addition of each bit weighted by some weighting factor. Therefore, after the entire sub-image is traversed, the corresponding feature value of each pixel can be calculated. All the values can generate a statistical histogram, which is regarded as the feature value of the sub-image. Experimental results from the two famous face databases indicate that the proposed method with the nearest neighbor classification obtains correct recognition rates of 85.29% and 96.50%. In terms of obtaining abundant texture information, the proposed feature in this study can lead to a high face recognition rate. The new feature also has some reference values for object recognition in other fields.
摘要:The imaging conditions in real scenes are complex. Vehicle detection therefore involves many challenges, among which the occlusion problem is one of the most significant ones. In the object detection literature, a deformable part model applicable to rigid object detection is one of the most practical part-based models. However, this model is limited by multi-object occlusion. This phenomenon results in many false negatives with a low detection score in vehicle detection because of the loss of visual information under real clutter. To address this problem, an occlusion compensation model is proposed in this study. This model analyzes the visible probability of parts according to a single viewpoint or to multiple viewpoints to compensate for the insufficiency of part-based models and thus avoid undetected errors. First, the position and similarity of each part are estimated with an appearance model in candidate regions to determine which part is under occlusion and to obtain the appearance and structure score of the object, which may be lower than the normal ones for multi-object occlusion. Second, the visible probability of a single viewpoint only considers occlusion conditions. By contrast, the visible probability of multiple viewpoints presents occlusion states from other components that are calculated to obtain the compensation score for occlusion, which refines the detection score of the occluded regions. Finally, we composite the appearance, structure, and our compensation score into an integrated model to reduce false negatives. Two parameters are important in this phase, namely, part detection threshold and occlusion compensation weight. The former is applied to determine part occlusion, whereas the latter is aimed toward a refinement with the detection score of an occluded object. A high part detection threshold produces a high occlusion compensation score that in turn leads to a high false alarm rate. The condition in occlusion compensation weight is similar. They are therefore carefully selected to control the false alarm rate and decrease undetected errors. Visible probability modeling based on a single viewpoint is suitable for simple cases, in which the visible probabilities at the same level of height are identical. On the contrary, the visible probability based on multiple viewpoints is true for complex cases, in which the visible probability near the visible part is high. The validation, which is qualitatively and quantitatively evaluated with precision-recall curves, is efficient in three data sets. The two data sets are highlighted in popular PASCAL and MSRC, and the remaining data set comes from a real scene. Experiment results show that our model could preserve the false alarm rate and improve the accuracy of vehicle detection under occlusion compared with the state-of-the-art model.
关键词:vehicle detection;occlusion;part-based model;visible probability of single viewpoint;visible probability of multiple viewpoints
摘要:Fog is a common condition that reduces the contrast of an image, bleaches the surface color, and considerably reduces the value of outdoor images. To address this problem, we propose a defogging method for a single degraded image on the basis of dark channel and incident light assumption. We scan the image with a window to determine the window with the maximum mean brightness. We use the obtained average value as the atmosphere light. The dark channel prior assumption raised by He is not suitable to images that contain a large scene, so we weaken the assumption. We assume that a channel of a pixel whose value is zero exists. Basing on this assumption, we identify the darkest pixel value in the entire image and use the darkest pixel value as the global dark channel. We use the ratio of the grayscale of the point to the atmospheric light as the basis transmission of the image. Using this basis transmission, we conduct the initial dehazing. The transmission rate of the image will then be stretched to the [0,1] range. Images taken under a foggy weather almost have no shadow. We therefore assume that the incident light during a foggy day is uniform. We estimate the transmission by using a multi-scale approach combined with retinex theory that uses Gaussian convolution to estimate illumination. According to the haze imaging model, we can recover a high-quality, haze-free image by using this transmission map and the initial dehazing image. By weakening the dark channel prior assumption of He, we considerably improve its accuracy and perform the initial dehazing on the basis of the weakened assumption. Unlike in other methods, the transmission map of our algorithm does not exhibit an apparent object contour. The fuzzy transmission map is obviously reasonable according to the scattering characteristic of fog. Experimental results indicate that the algorithm can provide an accurate estimation of the transmission, and the restored images show natural colors and clear details. The algorithm also exhibits low computational complexity and almost does not need to set any parameters. Our algorithm shows good results and substantially increases the computing speed compared with haze removal theory on the basis of the dark channel prior of He. The proposed algorithm is not limited by a poor capability in processing images with thick fog, which is a key concern in Fattal's method. This study proposes a new method to assume transmission on the basis of incident light assumption and retinex illumination estimation principle. A large number of comparative experiments show that the algorithm can significantly restore the quality of an image degraded by fog. Our method is effective for images taken under thin and thick fog, demonstrates wide applicability, and involves a simple principle. The proposed method is also applicable to grayscale.
摘要:The finite mixture model (FMM) is an unsupervised learning method that is widely applied to data classification tasks, particularly image segmentation. The Gaussian mixture model is a successful example of FMM used in image segmentation. However, segmentation result is sensitive to noise because the spatial relationship among neighboring pixels is not considered. To solve this problem, spatially variant FMM (SVFMM) and its improvements have been proposed by incorporating spatial constraints into the prior distribution of each pixel. These improvements have been demonstrated to be capable of noise suppression. The spatial constraint of SVFMM has been widely studied and improved. To enhance the robustness of FMM against noise, a new SVFMM is proposed in this study. The proposed model based on the concept of morphological dilation considers the existence of a spatial relation in the posteriori probability distribution of the pixel neighborhood to reduce the interference of noise in image segmentation result. The proposed model is also introduced to prior probability distribution. Spatial relationships are incorporated into prior distribution for spatial smoothness by redesigning morphological dilation. The concept of morphological dilation is adopted to increase the probability of the features of a pixel in the statistic instead of the feature value itself. The neighboring pixels are smoothened iteratively by the label with the highest probability in the neighborhood. To maximize the likelihood function, gradient descent technology instead of the expectation-maximization algorithm is employed to estimate the parameters of the proposed model. The proposed model is implemented via MATLAB. Experimental data include synthetic and medical computed tomography (CT) images. Synthetic images are interrupted by different degrees of noise to test the robustness against noise of the proposed model. Medical CT images are used to analyze effectiveness in real applications. Experiments on image segmentation show that the proposed model exhibits considerable noise suppression effect and computational efficiency. Compared with the existing SVFMM improvements presented in literature, the proposed model uses less parameters in estimation, is easier to implement, and has lower computational cost. The proposed model is superior to compared models in terms of robustness against noise, segmentation accuracy, and computational efficiency. The computational efficiency of the proposed model is better than that of most SVFMMs with spatial constraints in terms of image segmentation performance. The result for the CT images shows that this research can provide valuable help in analyzing similar criminal cases. In the field of criminal investigation, accurate extraction of a segmented region is a prerequisite to analyze images related to crime.
摘要:Statistical models cannot agree well with the distribution of texture in a high-resolution synthetic aperture radar (SAR) image. Obtaining good results with traditional polarimetric SAR (PolSAR) image segmentation methods is therefore difficult. To overcome this problem, we propose a new PolSAR image segmentation method that combines KummerU distribution and the level set framework. The proposed method defines a new energy function with the KummerU probability density function as a high-resolution PolSAR image statistical model. The method is therefore suitable to PolSAR image segmentation. To implement PolSAR image segmentation, the parameters of KummerU distribution are estimated with the maximum likelihood method. The level set function is applied to the numerical solution of a partial differential equation. Experiments are based on synthetic-full-polarization and real-full-polarization SAR images. All experiments show good results, with an accuracy level above 95% compared with that of the traditional method and which therefore proves the applicability of the proposed algorithm. We propose a novel PolSAR image segmentation method based on a level set framework. The algorithm is applicable to high-resolution PolSAR images. Experiment results indicate that our model can be effectively used in most scenes and can separate targets from the background in homogeneous, heterogeneous, and extremely heterogeneous areas.
摘要:The spatial structures revealed in remote sensing imagery are essential pieces of information that characterize the nature and scale of the spatial variation of sea ice processes. The freezing and melting of sea ice lead to changes in sea environment conditions, which in turn cause lockout, channel blocking, ship damage, and other issues. This study evaluates the potential capability of using the variogram of the intrinsic regionalization model to estimate sea ice density from synthetic-aperture radar (SAR) intensity images. A different geo-statistic metric is introduced, in which the spatial structures of sea ice are considered as a combination of two stochastic second-order stationary models. Under the stationarity assumption, a spatial structure model is proposed on the basis of second-order variograms to describe the sea ice density in multi-look SAR images. First, the multi-gamma model is used to characterize continuous variations that correspond to water or the background of sea ice. Second, a Poisson tessellation-based mosaic model is used, in which the image domain is randomly separated into non-overlapping cells. In each cell, a random value is independently assigned. The linear combination of these two stochastic models defines the mixture model to represent the spatial structures of sea ice presented in the SAR intensity imagery. Finally, the least squares method is used for the fitting method to estimate parameters. The image spatial structures are characterized by the variance weight and the variogram range related to each model. The proposed algorithm is applied to Radarsat-1 images acquired in different days to identify the change in sea ice. Experimental results show that the proposed method can accurately and stably estimate sea ice density. The real sea ice densities of the experimental areas are 20% and 80%. The errors between the real sea ice densities and the results from the proposed method are no more than plus or minus 10%. In this study, the spatial structures of sea ice shown in a SAR intensity image are characterized by the intrinsic regionalization model. The algorithm is applied to a Radarsat-1 SAR intensity image to detect the sea ice change during two months in 2008. Results demonstrate that the algorithm is useful in detecting sea ice change in terms of intensity and size. However, these findings are limited by the number, types, and small size of the sea ice. Several aspects of the proposed algorithm can be improved in further studies. With the availability of the high-resolution polarization SAR image, the problem caused by high resolution and full polarization should be considered. For example, for the full-polarization SAR image, the spectral and spatial correlation of the same sea ice type becomes complicated. High resolution makes both the details and the noise in the homogeneous sea ice area prominent and enables the SAR images to show massive data characteristics. All of these issues result in unexpected difficulties in the design of high-resolution polarization SAR image-orientated algorithms for sea ice detection. In our future work, the suitability of the proposed algorithm to a high-resolution polarization SAR image will be improved through development of a spatial statistic model in spectral space.