摘要:With the rapid development of computer multi-media technique,database technique and computer network technique,there have been more and more images to classify and label. Instead of using traditional manual mode,it has been a hot research field to use computer-aided automatic image-scene classification techniques. Among numerous image scene classification methods,the bag-of-visual-words(BOVW)model has become a widely adopted one,which,as a middle level feature,can narrow the gap between low-level visual features and high-level semantic features. However,reviews about BOVW model in image scene classification are rarely seen on journals in China and abroad. Therefore, in order to give a comprehensive understanding of this method to researchers in this field,this paper systematically summarizes these studies. Based on numerous references about the BOVW model in image scene classification during almost the past ten years, we divide the general process of development of the BOVW into five stages,that is,the stage of direct application of early bag-of-words model in image field,the stage of studying latent semantic information in the BOVW model,the stage of studying spatial layout or structure information in the BOVW model,the stage of studying context information in the BOVW model,and the stage of optimizing visual word semantics and introducing new methods into the BOVW model. Furthermore, we sum up and compare various existing BOVW models in image scene classification in terms of local feature selection,feature generation of local image patches,visual vocabulary construction,histogram representation of bag of visual words feature,optimization of visual words,and so on. The development history of the BOVW and the research status of the BOVW based image scene classification are reviewed,which gives a clear trail of the development of the BOVW model;the numerous existing the BOVW models are categorized according to their working mechanism;the advantages and disadvantages of commonly used methods are compared;the performance evaluation method for the BOVW model is described and the commonly used standard scene databases are collected,with their best classification accuracies given separately. As a hot research field that is currently rising,studies of the BOVW methods in image scene classification have produced quite a few research progress. The research in computer vision field has no longer been limited to directly applying original the BOVW model to describe image content,and more and more differences between images and texts are considered. The urgent problems to be solved are as follows:the performance of the BOVW will be greatly influenced when the bag of visual words are applied to the samples that are quite different from the training ones,while training new bag of visual words based on new training samples is very time and labor consuming;there is still no theoretical guide for determining the size of visual vocabulary;the relationship between visual words and semantics is still not fully exploited;the application of the BOVW in special fields,such as high-resolution remote sensing land-use scene classification,is far from being satisfactory. Besides,based on these problems,there may be some interesting research directions for example: constructing universal self-adaptive bag of visual words for different sample sets,automatically selecting optimal vocabulary size,adding more spatial layout and context information to the BOVW and exploring latent semantic information in visual words,studying image visual grammars for image understanding,studying scene classification problems in images of special fields,such as high resolution remote sensing images,and investigating new well-characterized low level feature extraction algorithms to construct high level bag of visual words. To conclude,although there are still a number of urgent problems to be solved in the application of the BOVW model based image scene classification,the important meanings of the studies of the BOVW model cannot be covered up.
摘要:With the continuous development of technology and resolution on synthetic aperture radar(SAR),thinner spatial features can be observed from the high resolution of newly available SAR images. Due to the larger data volume and higher application requirements,the traditional pixel-based methods have many limitations in high-resolution SAR processing. Object-based image analysis(OBIA)provides a new idea for high resolution remote sensing image analysis based on the pixel set-"object",which is becoming one of the hot spots of remote sensing,photogrammetry and GIS. Currently,the technique has been widely used in optical remote sensing. However, the application in SAR image processing is still in its early stage. The objective of this review paper is to provide a comprehensive overview and sound summary of all of the work undertaken. In this paper,after briefly introducing the origin and characteristic of object-based image analysis technique,the muti-scale segmentation methods commonly used in SAR image,are introduced. Then the applications of object-based image analysis in high resolution SAR images processing are presented. Finally,the conclusions are drawn and the outlook of the applications of object-oriented technology in SAR image is given. The object-based image analysis has been used in some SAR applications successfully,such as image classification,information extraction of urban area,change detection,the monitoring on ocean and forest. The object-based image analysis plays an important role in solving the scale effect and speckle reduction of high-resolution SAR images. Recently,while some excellent studies have been performed in the applications of SAR Images with object-based image analysis,many challenges remain and much work still needs to be investigated.
摘要:Entropy coding is a new technology used in the H.264 video standard.In the video security communications,entropy-coding encryption is one of the important encryption algorithms. Commonly, the H.264 entropy-coding encryption is a fusion of encryption and entropy encoding video security algorithm,which is fast and the data format can remain unchanged. It is a compatibility encryption scheme,but needs to deal with all of the data frames. To deal with the problem of the entropy coding encryption process, we propose a video encryption method based on key frames selection combined with entropy coding,by capturing all 8×8 sub-macroblock motion vectors in one frame and, constructing the vector to represent the change of the content of each frame. Then the Euclidean norm is compared with the threshold ; if greater than or equal to the threshold,the current frame is the key frame. Then encrypts the next frame when the coding motion-vector and the coefficient of the residual value,otherwise it does not participate in any encryption arithmetic. By controlling the threshold value to determine the key frame selection,combines with the chaotic sequence to guarantee the security of videos. Compared with the entropy coding encryption algorithm, the encryption data is reduced about 39.78%. The experiments show that the encryption method maintains the total data,obtains a good encryption effect with a smaller time cost,and significantly reduces the amount of the encrypted data. The operation only changes the entropy encoding data, and it has nothing to do with the other processes of video coding. Therefores, the encryption system has no effect on the encoded data quantity. This algorithm is a compatibility video encryption.
摘要:Traditional zero-watermarking algorithm requires characteristic image watermark data. Using these methods,it is often necessary to construct meaningless watermarks. Based on the existing visual cryptography robust watermarking algorithm,combined with zero-watermarking, we present a new visual cryptography zero wavelet domain zero watermarking algorithm. We need to embed the meaningful binary image into the original image,without changing the original image. A zero watermark is not only generated by the carrier image but it is co-produced with the visual watermark secret shares. The proposed algorithm generates two parts:one is the main map component(image feature information) and the other is the ownership chart parts(zero watermark). First of all, using the rational scrambling on the carrier image,it removes the pixel correlation. Second,doing the wavelet transform,the wavelet sub-bands are blocked and the singular value decomposition with each of the blocks is produced.After those steps, we compare each block's characteristic value and the mean value of each block. As a result of that, we can get the transition matrix. Third,the transition matrix is combined with a 2×2 visual secret sharing algorithm to generate the main sharing. Finally, by combining the main sharing with the secret watermarking information, we produce the proprietary sharing and save it to the certification center. Aiming at zero-watermarking information is not very intuitive,in the conditions of not making any changes to the carrier image, we could embed the meaningful binary image into the carrier image. Even in highly robust disturbance environment,this method also has a better performance than the traditional zero-watermarking algorithm. The presented zero-watermarking algorithm of image copyright authentication is a very reliable way to achieve our goals. As shown by our experiment,this algorithm has a good security. In addition to this aspect,it is also more robust when confronted with many different types of image processing.
关键词:zero-watermarking;copyright authentication;discrete wavelet transform(DWT);sigular value decomposition(SVD);visual cryptography;visual secret sharing
摘要:As the representative of geometric active contour model,the C-V model as well as its improved LBF model has attracted much attention. However,the C-V model and LBF model have strong dependence on the initial contour curve, so that they are instable or have high computational complexity in the process of image segmentation. In this study,we first analyze the principle of the two models and their characteristics of dependence on initial contours. Based on our analysis,we address a novel level set model for image segmentation using dual contour evolutional curve. The process of the proposed model is as follows:1)By setting the inner and outer contours,the model can approximate the target boundary from both, intern and extern of an object. The design principle of two contours is simple,and two contours are selected to be external and overlap with the object. 2)The evolution of two contours is controlled automatically through setting related terms of the model. The evolution controls the evolutionary trend of two contours automatically by minimizing the difference between internal and external contours,and stabilizes gradually at the boundary of the target from the internal and external. The proposed model avoids the re-initialization of signed distance function by setting an internal energy functional in our model. In addition,the proposed model enhances the capability of capturing the boundary in complex heterogeneous areas by applying the global regular function. By adopting the evolution mechanism of the internal and external contour at the same time,the proposed model avoids the dependence on initial contour curve. The proposed model avoids strong dependence on the initial contour of the traditional region-based segmentation model,and the initial contour is easy and robust to be selected. The segmentation results of objects are accurate and stable.
关键词:image segmentation;dual contour;level set model;initial contour;energy functional
摘要:An improved algorithm is proposed for single image haze removal using dark channel prior to avoid the low efficiency of the classical algorithm and color distortion of haze removing in bright area. The proposed algorithm is reflected in the following two aspects. One is the spatial adaptability of the atmospheric transmittance estimation by the introduction of a blocking idea. The other is that the bright areas can be contained by the absolute value of the difference of atmospheric light and dark channel. Compared with the classic algorithm,the time complexity of the algorithm is decreased and the weakness of transmission in the bright area is remedied. Experimental results show that the improved algorithm is efficient and feasible.
摘要:Images captured in foggy weather are often degraded due to atmospheric particles scatter. Concerning this issue, we present a new haze removal method from a single image based on bilateral filtering. This algorithm is based on the atmosphere scattering model. First, the accurate atmospheric veil is obtained by taking full use of the properties that bilateral filtering is available to preserve edges smoothly. Then, concerning the distortion of the bright portion,a weakening defogging strength method is proposed. Finally, the haze-free image is recovered by transforming the atmosphere scattering model. Many experiments show that images restored by this algorithm are clear and natural,especially at the portion for distant scene and for abrupt depth changes. In addition,the time complexity is only a linear function of the input image pixels. Concerning degraded images captured in fog or haze weather,a new haze removal method from single image is proposed based on the atmosphere scattering model and the bilateral filter. Experiments show the algorithm can get an excellent performance for fog elimination,particularly better than Tarel's defogging algorithm in details. At the same time,compared with the defogging algorithm proposed by He Kaiming,it has a clear advantage for run time consideration,which is important to achieve real-time applicaiton.
关键词:image enhancement;image defogging;bilateral filter;atmosphere scattering model
摘要:In order to deal with that most traditional highlight removal algorithms based on an illumination model fail to perform well on those images which have saturated pixels,this paper presents an inpainting algorithm guided by saliency detection. First,we apply the saliency model to the YUV space to detect and mark the highlight areas automatically. Then, we inpaint the highlight areas marked by the saliency model with the modified self-adaptive Exemplar-Based algorithm. We test on natural scene and emulational images,experimental results demonstrate that compared with the classic image inpainting and highlight removal algorithms,the results of the proposed method are more nature and have better image quality. Compared with Exemplar-based and Tan algorithms,the proposed method performs better on dealing with a single image in which the highlight areas are saturated and is not limited by the illumination model.
摘要:In order to improve the security of a steganography system,an adaptive steganographic algorithm for JPEG images based on the block standard deviation of the DCT(discrete cosine transform)coefficients is proposed. Block standard deviation of the DCT coefficients can reflect the complexity of image regions. Therefore, a distortion function is designed according to rounding distortion,quantization step,and standard deviation. Then,the STC(Syndrome-Trellis Code)is combined with the distortion function to propose the steganographic algorithm. Distortion function can be more rational when block standard deviation of DCT coefficients is used in design. In this way, the total distortion of the image is reduced after embedding and the security is improved. Experiments show that the proposed algorithm possesses a better undetectability. The security embedding rate can be about 0.35 when the quality factor is 75.
摘要:Lens distortion,which is a fundamental problem in photogrammetry and computer vision, has serious effects on the three-dimensional reconstruction and on the geometric image measurements. In this paper,an automatic correction of weak radial distortions based on vanishing points from a single image is presented. In order to reduce the effects caused by the nonlinear model included in the distortion center,the distortion center is further optimized after the radial distortion is estimated. A nonlinear model of vanishing points and radial distortion,which can be estimated by the Levenberg-Marquardt (LM) algorithms,is established based on the geometric constraint of vanishing points;then,the distortion center and radial distortion is optimized by an iterative method based on a suitable quality measurement. At last,the presented method is tested and verified. The radial distortions using different data are all effectively corrected. Compared with the traditional methods based on non-metric distortion correction,the results of camera calibration after correction are significantly improved. The results of our experiments indicate that the approach can effectively correct radial lens distortion,while effectively overcome the instability of the traditional methods based on non-metric distortion correction.
摘要:This paper addresses the problem of image set matching in kernel learning approach, and a locality geometry structure preserved discriminant analysis based on a Riemannian manifold is presented for image set matching. Riemannian manifolds have been an effective way to represent image sets, which are mapped as data points on the manifold. Then, recognition can be performed by applying the discriminant analysis on such manifolds. However, the local structure of the data points is not exposed in the discriminant analysis. In computer vision applications, the multi-view facial images are nonlinearly distributed, and features often lie on Riemannian manifolds with known geometry. Set-based matching methods utilize a set of images as input and model the probe set and gallery set individually. Hence, these methods can fully utilize the information provided by multiple images to obtain better matching and recognition accuracy. However, the popular learning algorithms such as discriminant analysis and support vector machines, etc., are not directly applicable to such features due to the non-Euclidean nature of the underlying spaces. To overcome this limitation, each image set is mapped into a high-dimensional feature space, e.g. a high dimensional Reproducing Kernel Hilbert Space (RKHS), using a nonlinear mapping function, to which many Euclidean algorithms can be generalized. For set based image matching techniques, as mentioned above, the key issues can be categorized based on how to represent the image sets and how to measure the similarity between two image sets. We naturally formulate the problem of the image set matching as matching points lying on the Riemannian manifold spanned by symmetric positive definite (SPD), i.e. nonsingular covariance matrices. In general, the success of kernel-based methods is often determined by the choice of the kernel function. By exploring an efficient metric for the symmetric positive definie covariance matrices, i.e. Log-Euclidean distance, we derive a kernel function that explicitly maps the nonsingular covariance matrix from the Riemannian manifold to a Euclidean space. Different from other methods on Riemannian manifold, the local structure of data is taking into account. With the explicit mapping, a kernel version of Locality Preserving Projection (LPP) is applied to keep the local geometry structure of the image set and an image set-based matching method is proposed. The proposed method is evaluated on set-based object classification tasks and face recognition tasks. Extensive experimental results show that the proposed method outperforms other state of the art set-based object matching and face recognition methods. In this paper, according our experimental settings, it reaches 91.5% and 65.31% on recognition rate in the public ETH80 object database and YouTube Celebrities video database respectively. In this paper, we have proposed an efficient image set matching method. This method respresents each image set with its covariance matrix and models the problem as matching points on the Riemannian manifold spanned by nonsingular covariance matrix. We derived a Mercer kernel function, which successfully bridges the gap between traditional learning methods operating in vector space and the learning task on a manifold. Through comparison experiments, it is shown to be generally better than other image sets matching methods.
摘要:The local binary fitting(LBF)model can segment images with intensity inhomogeneity because it fits the local energy by adopting the neighborhood information of each pixel.However, without considering the global information, LBF only considers the local information, which leads to the sensitivity for size, shape and position of the selected initial contours.To solve these problems,a "two-stage" active contour model is proposed by combining local and global information of images in this paper. On the first stage, the degenerated Chan-Vese model and the global information (mean gray value) of the image are used to roughly but quickly locate the target.On the second stage, the local information (local Gaussian fitting) is employed to obtain a more accurately segmentation result.The initial contour of stage two is based on the zero level set function at the end of stage one. The experimental results show that the proposed method keeps the advantage of the LBF model: effective for inhomogeneous images. Meanwhile the improved model possess other improvements comparing with LBF: robust to the selection of initial contours (size, shape and position) and to noise.
摘要:The appearance model and features of objects are commonly used for object matching.In long-term tracking, having large variations in scales, shape deformation, and other noises, it would be very challenging to success-fully keep tracking in this way.An effective object appearance model is proposed, which can improve the efficiency and effectiveness of object tracking. Image cues are used to describe the object appearance in this method.After image segmentation, the information is extracted from the superpixels (each segmentation block represents one superpixel).Then their SIFT descriptors are clustered to form a codebook.The weight of each word in the codebook is calculated to construct the target model to filter the superpixel points.Next the pyramidal Lucas-Kanade tracker is used to predict the location of the superpixel points in the next frame and move the tracking window. Combined with the weighting of point displacement, can conquer the variations in scales and shape deformation can be handled. Experimental results show that the proposed method has good and robust performance even with appearance deformation and illumination changes.
摘要:Edge detection from remote sensing data is an important step of automatic target recognition.However, objects in high-resolution remote sensing images are complex, and the detail information in high spatial resolution remote sensing images is often too rich; consequently there is too much noise or pseudo edge in the result of edge detection based on the phase congruency model.Combining phase congruency with total variation model, an approach of edge detection for high-resolution remote sensing image is proposed. First, according to the principle of phase congruency, a two-dimensional phase congruency model is constructed using Log Gabor.Second, an edges response map from high-resolution remote sensing image based on a phase congruency model is improved by a total variation model. Then high resolution remote sensing image noise removal and pseudo edges inhibition are achieved by space of bounded variation restrictions on the image smoothness.Additionally, edges from high-resolution remote sensing images are detected by the improved edges response map based on phase congruency. Compared with methods based on the phase congruency model and the Canny algorithm, the experiment results show that proposed approach could eliminate noise produced by the internal details characteristics within the similar objects from high resolution remote sensing image, suppress pseudo edges produced by the phase congruency model, protrude real edges of targets in high resolution remote sensing images, and could correctly extract the target contour information.Therefore, the provided approach can extract edges from high-resolution remote sensing images effectively, and is therefore helpful for subsequent automatic target recognition.
摘要:To overcome the shortcoming of the traditional fusion methods that use single features to measure the pixel sharpness and consider the non-subsampled contourlet transform (NSCT) coefficients and the human visual perception characteristics, a novel NSCT-based multifocus image fusion method is presented. The method consists of three steps.In the first step, the source images that came from the same scene to be fused are performed by NSCT.In the second step, the lowpass subband coefficients are fused by a new combination of local visibility, local visual contrast and local texture features, while the bandpass sub-band coefficients are fused by a normalized and correlation weighted local visual feature contrast that utilizes the information of neighborhood and cousin coefficients.In the third step, the fused image is reconstructed based on the inverse NSCT. Compared with series of fusion methods, including discrete wavelet transform (DWT), shift invariant DWT (SIDWT), CT, NSCT, the method fused by the information of neighborhood and cousin coefficients, and the proposed method can get better visual effect and higher values of edge information retention and mutual information. Experimental results can prove the effectiveness of the proposed method through both the qualitative and quantitative evaluation accordingly.
摘要:In pen-based military situation marking systems, recognizing hand-drawn symbols is confronted with several challenges, such as numerous classes of graphic symbols, high similarity between classes, and orientation variation of many rotatable symbols.A rotation free recognition paradigm is presented considering these difficultiesand, aiming at classifying an instance of a symbol as well as estimating its rotation angle. First, rotation invariant coarse classification is performed to narrow the range of candidate classes.Then the rotation angle between the unknown instance and the template instance is estimated. They can be rotationally aligned by compensating the rotation angle between them.Finally, fining classification methods can be applied to distinguish similar symbols.A novel Zernike moments-based descriptor, called DZM, was used to represent hand-drawn symbol samples.It combines the spatial distribution of sample points and their local direction information.By matching DZM features, both coarse classification and rotation angle estimation could be accomplished. Experimental results show that the proposed method outperforms the traditional Zernike moment method in both classification and rotation angle estimation of hand-drawn military situation marking symbols. This method can be applied effectively in rotation free recognition of online hand-drawn military marking symbols.
摘要:When the hand and the forearm enter the available depth range of the depth camera, the data of the hand and the forearm will be extracted together.Processing these data as a whole may affect some important algorithms such as the center of palm estimation, the orientation of hand estimation and hand tracing.The center of the palm is quite stable in the gesture interaction, so the line through the center of the palm and the center of the hand cluster is usual used as a hand orientation indicator.As a result, improving the performance of the center of palm estimation is favorable for increasing the overall performance. In order to correctly separate the hand from the forearm, the research begins with finding the motion features of the wrist and the contour features of the hand, and takes advantage of the geometric characteristics of an inscribed rectangle.At last a wrist recognition algorithm has been proposed.For improving the performance of the center of palm estimation, we start with analyzing the geometric characteristics of an acute triangle and an inscribed circle, and combine the features of the hand interaction.Finally a new algorithm of estimating the center of palm is proposed. The algorithms above are tested in an air multi-touch system.The proposed algorithm in this paper runs nearly 7 times quicker than the original algorithm, and still can keep the stability of the estimated coordinate of the center of palm, coordinate deviation not more than 3 pixels.Moreover, the success rate of the center of palm estimation is improved by using the wrist recognition algorithm.The accuracy of the center of palm estimation is improved due to using the wrist recognition algorithm. Our experiments proved that the wrist recognition algorithm can separate the hand from the forearm well, and the new algorithm of the center of palm estimation can support real time interaction well.
关键词:wrist recognition;the center of the palm estimation;depth camera;hand segment;gesture interaction
摘要:License plate recognition (LPR) is the core module of an intelligent transportation system.LPR algorithms are generally composed of the following three processing steps: 1) detection of the license plate region; 2) segmentation of the plate characters; 3) recognition of each character.License plate detection is the key step of LPR, and its result directly determines the performance of the LPR system.Most of current license plate detection methods employ single features, such as the edge feature, the structure feature or the color feature, to locate the license plate, and cannot obtain satisfactory results.In order to improve the accuracy and speed of the license plate detection, and to reduce the false detection rate, we propose a license plate detection method based on multiple features. First, edge density information is used to remove most of the background area, which can greatly improve the speed of the detection process.We divide the image into small cells, compute the edge density of each cell, and remove the cells whose edge density are too large or too small.Then we use the distribution information of the license plate characters to precisely detect license plates in the remaining regions.Coupled morphological operators are used to employe the character regions and Hough transformation is used to obtain the position of the license plate.After that, we segment the license plate into characters and get the Histogram of Gradient (HOG) features of each character.The HOG features of each character are used to verify whether the character is a rightful license plate character (letter or digit).If there are more than five rightful characters in the license plate candidate, the candidate is regarded as a true plate. We establish a dataset that contains 9980 high-resolution images, and test our algorithm in three ways, that is, detect license plate by 1) context and structure information, 2) structure and part information, 3) context, structure and part information.The experimental results show that by employing the context information, most of the background areas can be filtered and the detection speed can be improved by the structure and part information, most of the false candidates can be removed. The detection rate of our method is 97.9%, and the average detection time is 16.3 ms. License plate detection is the fundamental step of LPR systems.Motivated by the fact that people detect objects by multiple features of the object, we propose a license plate detection method to detect license plate by combining multiple features of the license plate.The context information is used to filter most of the background areas and improve the detection speed, and the structure and part information is employed to remove most of the false candidates.The experimental results show that the proposed method can detect license plate accurately and fast under various conditions.
摘要:H.264/AVC encoding standard effectively improves coding efficiency by using the intra and inter prediction techniques.However, due to the use of variable block sizes and rate-distortion optimization techniques, there is an extremely high computational complexity for inter-prediction in H.264/AVC, which limits the applications of H.264/AVC.This is because the encoder needs to exhaustively check all the prediction modes for identifying the best one.Therefore, reducing the complexity of the encoder is very important especially for real-time applications.Choosing a small number of candidate modes other than all the modes to perform rate-distortion optimization can reduce the complexity and speed-up the encoding process.In this paper, a fast intermode decision algorithm for P-frame encoding based on the decision tree is proposed to reduce the number of modes required to be checked. The inter frame motion estimation for a 16×16 size is firstly performed when a macroblock (MB) of P-frame is encoded by using the inter prediction technique.Additionally, we use the information of residual MB after motion compensation to select a small number of candidate modes from all possible prediction modes to calculate and compare the rate-distortion cost.At first, the number of 4×4 all-zero blocks in the residual MB can describe how accurate the 16×16 inter prediction is.Thus with the statistical analysis of the correlation between the number of 4×4 all-zero blocks in the residual MB and the best prediction mode in full search algorithm, the candidate modes of some MBs can be directly determined based on the number of all-zero blocks in the residual MB.Then, for the remaining MBs, machine-learning tools are used to exploit the correlation between the residual MB information and the best mode in H.264/AVC full search algorithm.By using the decision tree classification algorithm, candidate modes can be selected based on the SATD features of residual MB. The proposed algorithm efficiently reduces the number of candidate modes for P-frame inter-prediction in the H.264/AVC encoder.Experimental results show that the proposed algorithm can achieve significant encoding time saving for all the test video sequences covering a wide range of motion activities. Meanwhile there is the less PSNR degradation and a small bit rate increase, compared with the reference full search algorithm.Although the time-saving of the proposed algorithm is not better than those of the comparative algorithms, the time-savings for video sequences with different motion degrees are almost identical and the rate distortion performance is obvious better. H.264/AVC video coding standard effectively improves coding efficiency but at the cost of high computational complexity.This paper presents a fast and efficient intermode decision algorithm to speed up the encoding process.The algorithm reduces the computational complexity by selecting a small number of candidate modes from the set of all possible prediction modes.We use the decision tree classification algorithm to determine candidate modes based on the information in residual MB after 16×16 inter motion estimation.The experimental results indicate that our algorithm reduces the computational complexity with a little loss in PSNR and increment in the total bit rate.
摘要:As research hotspot in recent years, no-reference (NR)image quality assessment has profound practical significance and broad application value.We present a new method of no-reference image quality assessment (IQA) based on mutual information(MIQA). Original natural images and their corresponding normalized luminance field and local standard deviation field are used as inputs.Self-correlated mutual information is used to quantify the correlations between neighboring pixels of three categories of inputs, and the quantization results are used as features.In addition, the multiscale analysis is introduced to obtain the mutual information features across two scales.The image distortion classifier and quality prediction model are trained by using a support vector machine (SVM) on the LIVE image database and conduct the NR IQA across multiple categories of distortions. We conduct the performance evaluation for our proposed algorithm on the LIVE image database, the experimental results show that the mean correlation coefficient between the quality judgment of this algorithm and the human subjective quality judgment is up to 0.93, and the total classification accuracy is up to 79%, delivering a performance which is competitive with the most popular full-reference (FR)/NR IQA methods. The method presented is different from the traditional NR IQA methods based on image transforms.Since natural images are highly structured, we focus on the inherent correlations between neighboring pixels of natural images, rather than the distribution of transformed coefficients, and obtain a good performance.Since the method presented is build without any image transforms and it is a global method, it has a relatively low time complexity.