摘要:Moving-object detection is important in many computer vision tasks. Background modeling is a traditional and usual method for moving-object detection. However, most of these methods are pixel-based, which only make overly simple considerations on background and encounter difficulty in handling real videos. Recently, robust principal component analysis (RPCA), which is based on low rank and sparse decomposition, has been studied in the field of moving-object detection by a growing number of researchers. To enable more researchers to have a comprehensive understanding of RPCA and to employ RPCA in moving-object detection, this survey conducts a thorough review of moving-object detection algorithms based on RPCA. In recent years, RPCA has received substantial attention from researchers in computer vision because of its excellent advantages of capturing slight variations in background appearance via low-rank matrix. Until today, many improved algorithms and applications based on RPCA have emerged in the computer vision field. In this paper, recent studies in moving-object detection based on RPCA are reviewed in detail. We classify those RPCA-based-moving detection methods into five categories, which are error mitigation, Bayesian theory, temporal and spatial information, multi-feature, and multi-factor coupling. In addition, this context summarizes and analyses studies on RPCA methods and their applications to moving-object detection locally and internationally. We employ the change detection dataset to compare the performances of the methods in different categories and the original RPCA. We use metrics such as recall, precision, F-measure, and time consumed for objective evaluation. Also, we illustrate foreground segmentation results achieved by those methods for subjective evaluation. Experimental results indicate that these improvements have solved certain problems in the original RPCA and have achieved more excellent performance compared with the original RPCA. RPCA is a popular research topic in computer vision field today. However, RPCA has certain limitations, which should be studied further. Involving video prior knowledge of foreground objects in RPCA is an emerging trend in the future.
关键词:object detection;computer vision;background modeling;robust principal component analysis;review
摘要:Unlike a mobile robot in an indoor-structured environment, an outdoor robot should recognize non-geometric terrain characteristics within a reasonable time and adjust the appropriate path, gait, and motion planning strategies to cope with the terrain. Visual terrain classification has become a hot topic in outdoor mobile robot research. The bag-of-visual-words (BOVW) framework, which can aggregate low-level visual descriptors and establish contact with semantic features, has become the most common approach and an effective paradigm for visual terrain classification. In this paper, we provide a comprehensive study of each step in the BOVW framework for visual terrain classification. Diverse methods in each step are introduced and summarized, and their characteristics and relations are explored. The BOVW framework includes four main steps: 1) feature extraction, 2) codebook generation, 3) feature coding, and 4) pooling and normalization. Feature extraction acquires low-level feature information from the terrain images to develop local descriptors. In the codebook generation step, a codebook is formed through clustering. The coding step uses the codebook to map the descriptors in the terrain image to the coding space. Then, coding results are aggregated into a single vector, that is, the mid-level feature, of the fixed length by pooling and normalization. Finally, the mid-level feature is fed into a linear or nonlinear classifier, such as SVM, for terrain classification. The diverse methods in each step are summarized and compared systematically. The performances of the method are preliminarily tested on a terrain dataset. The BOVW framework for visual terrain classification is reviewed in the paper. We also present a preliminary comparison of different BOVW frameworks for visual terrain classification on the terrain dataset. On the basis of the result, we find that every step is crucial in contributing to the final classification performance, and an improper choice in one step will markedly weaken the effectiveness and efficiency of the visual classification system as a whole. New handcrafted descriptors that are specific to the visual terrain, modified BOVW framework, and feature fusion are three potential research directions. Visual terrain classification is an important technology for recognizing non-geometric terrain characteristics for outdoor mobile robots. Compared with other sensors, visual information most closely resembles the manner by which humans perceive the environment and provides richer terrain information, and visual terrain classification has become a hotspot issue in outdoor mobile robot technology. However, visual appearances of the same terrain type may exhibit vast differences, and various types of terrain may appear highly similar. Therefore, these issues engender numerous challenges to visual terrain classification. Both effectiveness and efficiency are necessary factors that should be taken into account in the design of the visual terrain classification system. Therefore, studies on the BOVW for visual terrain classification are of considerable significance.
关键词:visual terrain classification;non-geometric hazard;bag of words;encoding methods;pooling methods;mobile robot
摘要:Obtaining an image that contains all objects in focus is difficult because of limited depth of focus of optical lenses. Image fusion target aims to generate a sharper image by integrating complementary information from multiple source images of the same scene. To improve fused-image quality, a novel algorithm based on non-subsampled quaternion shearlet transform (NSQST) is proposed in this paper. First, source images are decomposed by NSQST to obtain low- and high-frequency sub-band coefficients. For low-frequency sub-band coefficients, improved sparse representation-based fusion rule is presented; then, for high-frequency sub-band coefficients, a scheme that combines new, improved spatial frequency, edge energy, and local similarity-matched degree is presented. Finally, a fused image is obtained by performing inverse NSQST. The proposed method can obtain better visual effects and objective evaluation criteria compared with other five fusion methods. Fusion quality indexes have increased by 3.6%, 2.9%, 1.5%, 5.2%, 3.7%, 3.2%, 3.2%, 3.0%, 6.2%, 3.8%, 3.4%, and 8.6% compared with the result of the NSCT-SR method. A multi-focus image is used in our experiment, and experimental results show that this method can be further applied in target recognition, medical diagnosis, and other fields.
关键词:non-subsampled quaternion shearlet transform;multi-focus image fusion;sparse representation;improved spatial frequency
摘要:Pulse-coupled neural networks (PCNN) often cannot achieve optimum efficiency in image fusion because of parameter-setting problems of the PCNN model. The PCNN model has many parameters, all of which are intrinsically linked to overcome PCNN parameter-setting problems. A novel technique that uses PCNN model parameters of multi-objective particle swarm optimization (PSO), is presented in this paper. The method consists of three steps. First, the PCNN model parameters are optimized using multi-objective PSO, and the optimal PCNN model is obtained. Then, the paper uses dual-tree complex wavelet transform (DTCWT) for multi-scale decomposition of source images. High-frequency image components are processed by the optimal PCNN model while low-frequency image components are fused by sum of modified Laplacian(SML). Finally, the fused image is reconstructed based on inverse DTCWT. Compared with many fusion methods, such as DTCWT, Laplace pyramid algorithm, and non-subsample contourlet transform, quantitative analysis is conducted for the fused image under indexes, such as mutual information, entropy, image quality factor, and standard deviation. The proposed method can obtain better visual effects, and higher values of edge information retention and mutual image information. Image fusion is an important research field in image processing technology. This study proposes a novel method that combines PSO and PCNN to complete image fusion. Experimental results show that the proposed method performs effectively in image fusion, which can be applied to the fields of computer vision, medicine, and remote sensing.
摘要:Low efficiency, low-segmentation precision, and invalid blurry-edge segmentation are due to disregarding fuzzy image features when combining region-merging and graph-cut (GC) algorithms. To solve these problems, we propose a method that uses information of maximum fuzzy two-partition entropy. The information is acquired by recursive computation to design the likelihood item of energy function in GC, in which the model is built based on the region as vertex. First, bilateral filter and watershed algorithm are used to pre-process the image to oversegment the input image into small regions. Second, based on maximum fuzzy 2-partition entropy of rock particle, corresponding membership functions can be used for setting the GC likelihood item. This way, more real energy functions can be acquired. Meanwhile, to improve the efficiency search of maximum fuzzy 2-partition entropy, a recursive algorithm with time complexity O() is presented to convert the fuzzy entropy computation to a recursive process; and non-repetitive results of processing moments are stored for the succeeding exhaustive optimization. Finally, designed-region merging and GC are used to assign region labels and complete segmentation. Experimental results indicate that the segmentation precision of the proposed algorithm improves by about 23%, and running time is 60% shorter than those of compared region-merging and GC algorithms. Relative error of our statistical results is 2%, with respect to those of artificial statistical results. Our proposed method can improve segmentation efficiency while ensuring segmentation precision. The results provide an important reference for engineering practice of automatic and efficient rock particle segmentation.
摘要:To improve the neutrosophic C-means clustering performance on noise image segmentation of non-convex irregular data, neutrosophic C-means clustering in kernel space is presented. Fuzzy C-means clustering algorithm is widely used in image segmentation because of its simplicity. However, the fuzzy C-means clustering algorithm directly clusters pixel value of different location, which leads to huge sample number. Fuzzy C-means clustering algorithm does not consider spatial neighborhood information of pixels, and directly uses gray value of sample to make clustering result in poor anti-noise performance. In this study, to overcome the limitations of fuzzy C-means clustering algorithm, neutrosophic C-means set is introduced into traditional fuzzy C-mean clustering algorithm. The neutrosophic C-means clustering algorithm is advantageous to effective segmentation of noisy or singular data, and its result is good for boundary segments of the sample. The clustering algorithm of neutrosophic fuzzy C-means using Euclidean distance is not suitable for clustering of complex structured data. Thus, use of nonlinear function data samples are mapped to high-dimensional feature space and wisdom of fuzzy C-means clustering algorithm. Kernel function concept is introduced into the neutrosophic C-means clustering algorithm. By using nonlinear problem that satisfies Mercer condition, nonlinear transformation is used to map non-separable, linear input mode space of low-dimensional space to a separable, high-dimension, linear feature space. ntroduced the concept of kernel function.By using the nonlinear problem that satisifies the Mercer condition,The nonlinear transformation is used to map the liner non separable input pattern space of the low dimensional space of the low dimensional space to a first separable high dimension feature space. Kernel Hilbert space theory is a nonlinear function of samples that are mapped to high-dimensional feature space, changing data distribution characteristics and into its neutrosophic C-means clustering algorithm. This theory is proposed in the clustering algorithm of kernel space neutrosophic C-means. Numerous image segmentation experiments to compare the results found in kernel space, which is obtained by neutrosophic fuzzy clustering algorithm, improve the existing algorithm for clustering with noise robustness and classification performance. Salt-and-pepper, Gaussian, mixed, and multiplicative noise tests are conducted on four kinds of C-means clustering algorithms, namely, fuzzy, kernel fuzzy, neutrosophic, and kernel neutrosophic. Then, peak signal sizes to noise ratio of the four kinds of image segmentation algorithms are compared. With different additives and multiplicative noises added to a large amount of image segmentation test, a new algorithm division has been proven to have a significant effect and good robustness, and its validity is verified.
摘要:To solve the problem in assessment algorithms of traditional three-dimensional visual comfort, which generally require a large amount of training data with subjective mean opinion scores to train a regression model, we propose a new visual comfort assessment model via multiple kernel boosting (MKL) method. First, considering the fact that humans tend to conduct a preference judgment between two stereoscopic images in terms of visual comfort, we select some representative stereoscopic images to generate preference stereoscopic image pairs (PSIPs) and construct a PSIP training set with a preference label set. Second, we extract multiple disparity statistics and feature type derived by estimating neural activity, associated with horizontal disparity processing. Then, a preference classification model is trained on the basis of the MKL method by taking the vector of the aforementioned differential features and corresponding preference label of each PSIP as input. Besides, a mapping strategy between classification probability and final predictive visual comfort is analyzed. Experimental results demonstrate that the proposed method can obtain a Pearson linear correlation coefficient (PLCC) larger than 0.84 and Spearman's rank correlation coefficient (SRCC) larger than 0.80, which are superior to those of other existing representative regression methods; and the cross-database testing further validates that it can achieve better PLCC and SRCC performance compared with support vector regression models. Compared with traditional regression algorithms, the proposed method performs better in predicting visual comfort accurately.
摘要:Automatic airport recognition from a remote sensing image is popular in the remote-sensing community. However, most studies extracted objects from cropped images because performing the same task on a high-resolution image is time consuming. Most studies extracted the image of a straight line, which is determined based on airport runways, and then determined the target; however, the straight line is not only a high-resolution image extraction of airport runways, but may also be of highways, railways, external walls of large-scale plants, marginal lands, mountains, formations, and other structures. Methods of distinguishing extracted straight runways are rarely studied. Furthermore, these methods focused on a single and large airport instead of a small one, and few of these methods studied multiple targets. China has recently launched a high-resolution mapping satellite that has global reach data acquisition and real-time data transmission ability. Beyond surveying and mapping, the mapping satellite can also be used for land cover classification, land object recognition, and resource exploration. In this paper, mapping satellite imagery is used (with 6 000×6 000 pixels) as a source to detect airport images that have rich details and complex land covers. Traditional spatial filter and edge-detection methods tend to fail in airport detection because of noise and false edges. To solve this problem, an edge-tracing model combined with speeded up robust features (SURF) detection is proposed. This paper presents a new method that combines edge tracing and SURF to extract airport images. Edge tracing is based on edge extraction. First, filtering is applied to suppress noise. Then, gradient magnitude and gradient direction along norms are calculated. Greatest local variations of gradients found by improved non-maximum suppression (NMS) method are deleted to obtain a single-pixel edge image. A contour line is extracted from this edge image. Finally, airports are detected through line detection and SURF. Airports in four mapping satellite images are successfully recognized using the presented method. Improved NMS and edge-contour tracing are helpful for edge extraction. Most of the object edges are successfully detected and are considerably clear with single-edge width. After the line detection for ground objects, a SURF detector is applied to recognize airports in the image. Two airports with different sizes, one large and one small, are successfully detected and extracted from two mapping satellite images, which demonstrate the effectiveness of the proposed method. The proposed method is considerably effective in airport detection. The airport detection method proposed in this paper is suitable not only for remote-sensing data of mapping satellites but also for application on day-painting satellites, such as other remote-sensing satellite data. The improved NMS method can extract more accurate and thinner edges while suppressing false edges. This method is extremely helpful for airport detection. The edge-tracing model can extract contours in different directions accurately, thereby improving line detection accuracy. Many ground targets, such as roads, railways, croplands, walls of factories and mines, mountains, strata, and others, can be recognized from line features. Thus, SURF is used to distinguish airports from these objects. Our experiments show that many key points extracted by SURF lie on the intersection of different lines inside the airports. Based on this observation, the airport can be identified by scanning images to find lines and key points.
关键词:improved non-maxima suppression;edge contour tracing extraction;speeded up robust features (SURF) detection;automatic airport extraction;Mapping Satellite imagery
摘要:Most of the existing vehicle logo recognition (VLR) approaches simply combine traditional image features with different classifiers, such as template matching, HOG(histogram of oriented gradient)/SVM(support vector machine), HOG/KNN(k-nearest neighbor), SIFT(scale-invariant feature transform)/SVM, and radial Chebyshev moments/SVM, but they do not consider structure features of vehicle logo image. Vehicle logo images have distinct structure information, which is the significant clue for VLR. Considering both their gray and structure features, this study proposes a new VLR approach based on pixel-pair features that are randomly sampled from foreground-background skeleton areas. First, a standard vehicle logo image is partitioned into foreground and background areas. Second, the skeleton areas of both foreground and background are extracted, which contracts foreground and background areas, because pixels in skeleton areas are more stable. Third, pixels are randomly selected from the two skeleton areas, forming pixel pairs by cross-matching that can obtain more pixel pairs. Then, through pixel-pair validation, accepted pixels are selected to express vehicle logo features. Features express similarity of local areas centered on both pixels, which reflect the gray relationship of logo and background areas after the imaging process. Pixel-pair validation has two crucial rules. One is that the pixel pair in a different sample of the same class has the same similarity relationship; and the other is that the pixel pair in a different sample of a different class has a different similarity relationship. Last, the connection relationship is introduced during classification. Through the connection relationship, independent pixel pairs, which contribute to vehicle logo recognition, can be aggregated. Experimental data derived from 19 044 vehicle logo images are captured by surveillance cameras in the real world. This image group is called Test Set 1. To test the performance of different methods under weak illumination, 2 326 vehicle logo images with weak illumination are selected in Test Set 1, which is now called Test Set 2. Experiment shows that compared with recognition results based on other image features, the method based on the proposed feature can achieve higher recognition performance. Test Set 1 has a 95.7% recognition rate. Under weak illumination, our method attains an 87.2% recognition rate on Test Set 2. Considering both gray and structure features of vehicle logo images, we propose a new VLR approach based on pixel-pair features, which are randomly sampled from foreground-background skeleton areas. Pixel-pair feature extraction process fully considers the logo shape, which makes this feature distinctive and exclusive. Experiments show that our method attains a higher recognition rate compared with traditional VLR methods. Our method has shown strong robustness especially under weak illumination.
关键词:vehicle logo recognition;pixel pair feature;foreground and background areas;skeleton area;random sampling;cross-matching
摘要:Controlling household appliances by hand gesture is one the trends in smart home systems. Hand-gesture recognition is a developing field in human-computer interaction. In the past decades, many gesture-recognition algorithms were developed for tracking and recognizing various hand gestures. These algorithms can be categorized into two classes: static and dynamic based. Unfortunately, both types have high requirements for camera devices and environments, such as the need for RGB cameras, fixed scenes, and so on, which cause difficulty in adapting to complex home environments, especially when using wide-field cameras or experiencing heavy interferences. This paper proposes a dynamic hand-wave recognition algorithm that can respond to moving objects periodically in video sequences. A hand wave is periodic, and its frequency is relatively stable. This feature makes hand recognition possible in large scenes provided by wide-field camera. To detect this action, hand wave is regarded as a periodically moving object in this paper, and a detection algorithm that can respond to periodically changing pixels is proposed. Detection is achieved through short (SF) and long filters (LF). SF smoothens several neighbors of the current video frame only, while LF considers more frames. By comparing SF and LF outputs, current pixel state, i.e., whether or not they are periodically changing, can be determined. By connecting these pixels, the periodically moving object (area), which is the hand-wave candidate, is confirmed. Then, sophisticated hand-gesture recognition algorithm is applied to confirm that the candidate is indeed the hand. In practice, finding small objects in a high-resolution image that contains a complex background is one of the most challenging problems in computer vision. However, if the moving state of the object, such as periodical moving, is known in advance, detection becomes much easier. Thus, static hand gestures are not considered in this paper. To fully evaluate the performance of the proposed algorithm, it is applied while turning on or off the light in a room, and five challenging scenes, including actual household environment, are used. An experimenter waves his hand in front of the camera until light is triggered (whether switching on or off); the waving action lasts more than four seconds. If the light is triggered within four seconds, one is added to success time. Otherwise, if the light is triggered by other actions, such as talking or walking, one is added to false trigger time. Experimental results show that compared with state-of-the-art algorithms, the proposed algorithm increases its success times by about 3% and decreases the false trigger time and computation cost (time) by 44% and 0.4 seconds, respectively. Results show that the proposed method meets the needs of practical application. In addition, the method is not based on moving-object detection algorithm and has low computation costs, which means that it can run under high-resolution condition efficiently or migrate to embed system conveniently.
摘要:Research on face recognition models has a long history, and has been successfully used in cosmetic surgery and other fields. Meanwhile, exploration of facial similarity measure methods has been conducted in the fields of computer vision and pattern recognition. Some applications, such as face recognition, face retrieval, and similar face search applications, have been widely used in many fields. Existing methods that measure face similarity are typically confined to calculating image similarity without incorporating face cognition patterns used by human beings. Therefore, results of current methods are usually less than optimal from the perspective of human cognitive habits. Based on face cognition patterns, each kind of facial feature, such as eyes, nose, mouth, and facial shape, is divided into several common types. For example, eyes are divided into 10 groups such as phoenix, circle, triangle, and others. After analyzing feature vectors of different types, we pick 20 images from CAS-PEAL-R1 database for each common type of every facial feature to locate feature points and calculate feature values. We use statistical values of the results to construct facial feature classification models, i.e., contour similarity measurement models for facial features. Contour information cannot include certain facial details, e.g., fold eyelid, high nose bridge, and so on. To measure similarity of details between two faces, we employ a circular local binary pattern operator to calculate the texture similarity of corresponding facial features. A combination of contour and texture similarities is used as criterion for a similar face search. Our test face database contains 80 frontal neutral and head-angle face images collected from the Internet; these images are different from the aforementioned training images. Our target face database consists of two parts: 1 040 frontal images in the CAS-PEAL-R1 database and 102 star identification photos collected from the Internet. Only a few CAS-PEAL database images are allowed to be presented in papers, so extra star photos are added to the target face database. We use our method and the method provided by Face++, which presents the highest level of similar face research, to search most similar faces for each test-face image from the target face database. Statistically, the overall accuracy of our method is higher than that of Face++. TOP1 and TOP2 retrieval results are obviously better than Face++, with accuracy rate gaps both reaching more than 12%. Experimental results show that the search results of our method are more satisfactory from the perspective of human cognitive habits. Thus, our method can be applied to search similar faces for frontal neutral and head-angle face images. Besides, the proposed image search approach based on cognitive models can also be applied in the business sector, e.g., image-based similarity search of online goods.
摘要:Linear-nonlinear-Poisson (LPN) model provides a good interpretation of neuron response, and one of its important links is linear feature extraction. Focusing on the issue that traditional information-theoretic Spike-Triggered average and covariance (iSTAC) algorithm cannot exactly describe neuron feature and extract its motion characteristics, especially in low-dimension stimulus, this study improves the iSTAC algorithm and proposes a new algorithm for neuron-filtering feature extraction. Statistics of non-spike-triggered stimulus are introduced to build a highly accurate objective function of neuron-filtering feature subspace and to enhance noise resistance of the system. To optimize solution space and to accelerate convergence rate, a variable metric method is accepted to maximize the object function. Experimental results of linear filter recovery under different nonlinear conditions show that the new algorithm is similar to the traditional iSTAC algorithm in high-dimension stimulus and has achieved significant progress in less than 6 500 dimension stimuli. Furthermore, results show that the new algorithm is better than the spike-triggered average (STA) and spike-triggered covariance (STC) algorithms. The proposed new algorithm has better adaptability and greater robustness, and can be applied to establish a complete extraction model of video motion feature based on visual features.
摘要:Image shape perception is the key aspect of electrostatic tactile interaction in fields of virtual reality and human-computer interaction. This approach is widely applied in phones and tablets and has good portability. Synchronous visual-tactile feedback should be rendered to achieve tactile simulation of image texture when people use their devices to interact with virtual environments. The goal of electrostatic tactile rendering is to develop tactile rendering algorithms for images that can provide realistic and various tactile sensations to consumers. Tactile rendering techniques for details of image texture directly affect the effectiveness of electrostatic tactile feedback. To provide high clarity and various tactile sensations for an electrostatic tactile system, a new rendering model based on local texture feature is proposed. First, we utilize local Fourier transform to enhance texture detail features. Frequency domain components of shape and local texture, which are edge features by Fourier coefficients, are separated. Local texture properties are characterized with different Fourier coefficients. Then, to realize local texture rendering with tactile devices, a mapping model between local texture feature and driving signal is set up. Local texture features are scaled to correspond to the same level of electrostatic force levels. Local Fourier coefficients of texture are mapped to lateral friction forces. This mapping model builds a new relationship between texture feature and electrostatic tactile, and provides a basis to generate different tactile sensations for different images. Finally, we can control actuation signal magnitudes to generate different electrostatic tactile intensities according to psychological model of electrostatic force and actuation signal. This psychological model relates the perceived force to the controlled actuated voltage applied to the self-developed electrostatic tactile device. The model displays various tactile feelings according to a local texture feature that is touched. We conduct contrastive experiments of tactile texture perception to validate the electrostatic tactile rendering algorithm. Experiments results indicate that 62.5% of the participants preferred local texture force feedback than other gradient force feedbacks. Participants thought that texture perception, provided by electrostatic tactile rendering algorithm of local texture, were clearer than the electrostatic tactile-rendering algorithm of gradients. The local texture-rendering algorithm can also display fine texture details. With clearer texture, highly delicate electrostatic tactile sensations are perceived by users. This rendering model can simulate feelings of texture and edge for most images. This study efficiently enhances local texture features by separating texture and shape features in the frequency domain for most images. A mapping model of electrostatic tactile and local texture features are used to ensure realistic tactile perception feedback. This paper presents a generalized tactile rendering model to display textures for the electrostatic tactile system. Interactive virtual environment immersion can be effectively improved by local texture tactile interactive technology.
摘要:Saliency detection is a fundamental part of computer vision applications. Its goal is to obtain a high-quality saliency map that can detect important pixels or regions in an image, which attracts human visual attention the most. Recently, saliency detection approaches in RGB-D images have become increasingly popular, and depth information is proven as a fundamental element of human vision. Most existing saliency detection methods are concentrated on detecting salient objects in 2D images, but they can not be used in detecting salient objects in RGB-D images. In this paper, however, a new RGB-D saliency detection approach based on feature integration and saliency-depth(S-D) probability correction method, is proposed. The proposed method considers image features both at the 2D and RGB-D levels, and extracts color and depth features to complement each other. First, the method extracts color and depth features, and sets four boundaries as background seeds to compute the initial saliency map by manifold-ranking algorithm. Second, according to RGB image saliency and depth maps, the method computes S-D correction probability. Third, the method computes another saliency depth map and uses the S-D correction probability to correct the result. After correction, the proposed method finally selects foreground seeds through image threshold processing. Then, a final saliency map is optimized by using the manifold-ranking algorithm again. In our experiments, we evaluate the saliency detection ability of our method and six state-of-the-art methods on a large and prevalent RGB-D image dataset, which contains 1,000 images. Experimental results indicate that saliency detection results from our proposed method are much closer to the ground truth than other methods. We also plot a precision-recall curve to show the advantages of the proposed method. From the precision-recall curve, we can conclude that the proposed method has better performance than the other five methods when recall is the same. In addition, we evaluate the time complexity of our algorithm. Our method can process a single image in 2.150 seconds, which is faster than the speed of most of the other methods. In this paper, we propose a novel RGB-D saliency detection approach that combines color features from the RGB image and depth features from the depth image. The depth features are extracted to guide RGB image saliency ranking. The RGB saliency detection results are utilized for saliency detection alignment results of depth images. Experiment results demonstrate that the manifold-ranking approach with feature integration can fuse depth and color features effectively, and enable those two components to complement each other. With the help of S-D probability correction, RGB saliency detection results can effectively guide depth saliency detection.
关键词:saliency detection;S-D probability correction;feature integration;Manifold Ranking;RGB-D;color feature;depth feature
摘要:Traditional clustering algorithm usually utilizes more spectrum than spatial information, which is susceptible to noise interference. In this study, we propose a hyperspectral image clustering algorithm based on simple linear iterative clustering (SLIC) and density peaks (DP) to solve the problem mentioned. Based on SLIC, we segment hyperspectral image and extract spectrum in superpixel. According to spectrum characteristics in the extracted superpixel, we calculate DP and search for the superpixel cluster. Clustering is performed by relationship between original pixels and superpixel cluster. The robustness and accuracy of the SLIC-DP algorithm are estimated by simulated hyperspectral data and two sets of real hyperspectral images. SLIC-DP reduces variance (61.86% and 41.61%) compared with K-Means and SLIC-KMeans shows significant robustness. In hyperspectral image of Salinas-A and Indian Pines, Adjust Radom Index (ARI) of SLIC-DP is 0.777 1 and 0.325 7. These rates show 10.71% and 78.86% improvement compared with the K-means algorithm, and 5.01% and 25.27% improvement compared with the SLIC-Kmeans algorithm, which means that SLIC-DP is more accurate than the other algorithms. The SLIC-DP algorithm has strong robustness with better accuracy. The wide use of spectrum and spatial information shows good performance in clustering hyperspectral images.