摘要:It is critical to restore and preserve edge details in image defogging. In existing classical model-based defogging methods, medium transmittance is computed directly, i.e., image matting refinement follows transmittance estimation, so the computational complexity is high. In order to reduce time complexity, transmittance is indirectly computed in virtue of bilateral filtering. In the solution of medium transmittance based on bilateral filtering, atmospheric scattering function and light value are relatively accurately estimated; therefore, transmittance can be indirectly computed. The proposed algorithm avoids the image software matting for medium transmittance refinement, so real-time performance is improved. The experiments are performed on two sets of outdoor foggy image to compare transmittance map, restoration effect, and operation time. In the proposed method, transmission distribution map is clearer, and the block effect of transmittance estimation is restricted. Image edges are smoothed during transmittance refinement. The operation time of the proposed algorithm is just 1.803 second for the image with the size of 608×456. In the global enhancement of foggy image of this paper, the local details are restored and the edges are preserved. Consequently, the proposed algorithm is suitable for image detection systems. The experiment results show that the proposed algorithm can protect edge details besides improving time efficiency.
摘要:Text detection methods based on the maximally stable extremal regions (MSERs)algorithm are now widely used in natural scene text detection. However, text regions in natural scene images can have complex backgrounds that differ from those in documents and business cards, which cannot be accurately extracted by the MSERs algorithm. A text detection method is proposed for natural scene images by integrating the maximally stable color regions (MSCRs)into MSERs in this study to overcome the said problem. The character candidates are first extracted with both the MSCRs and MSERs algorithms. Parts of the non-character candidates are then eliminated according to the geometric information. The texture features are exploited to distinguish the character and non-character candidates, and a random forest character classifier is trained. The non-character candidates are then eliminated according to the classification result of the character classifier. Finally, the single character candidates are grouped into text regions according to the color similarity and geometric adjacency information. The proposed natural scene text detection method achieved 71.9%, 84.1%, and 77.5% in recall rate, precision rate, and f-score on the ICDAR 2013 database, respectively. The recall rate and f-score improved, unlike other state-of-the-art methods. The proposed text detection method is robust for natural scene images, and experimental results show the effectiveness of the proposed method.
摘要:Foreground detection is a key research area in the field of video surveillance. The local binary similarity segmenter (LOBSTER)algorithm combines the visual background extractor (ViBe)algorithm with the local binary similarity patterns (LBSP)feature,which obtains excellent detection performance in general scenes.However, it has poor adaptability and high detection noise in the dynamic background. An improved LOBSTER algorithm is proposed to solve the aforementioned problems. The LBSP value of each pixel is calculated at the initialization stage of the model. The gray and LBSP values of the pixel are then added to each pixel of the color background model and LBSP background models, respectively, which enhances the description of the background model. The standard deviation,which is calculated in the neighborhood of each pixel, is utilized as a measurable index of the complexity of the pixel at the pixel classification stage. Adaptively adjusting the classification threshold in the color and LBSP background models can lower the noise in the foreground according to the background complexity. The conservative update strategy is still used in the improved LOBSTER algorithm to update the LOBSTER background model at the model updating stage.When a pixel is determined as the background, the pixel update is adopted as its own background model. If the background complexity of the pixel is smaller than a certain threshold, then the pixel is also added to the background model of the neighborhood by the probability of 1/,wherein the general value of is 16. If the background complexity is larger than a certain threshold, a new pixel is randomly selected in the pixel neighborhood which is classified as a background pixel. The selected pixel is then added to its own background model by the probability of 1/. The adaptability in the dynamic background is improved by adaptively updating the model strategy. Many qualitative analysis and quantitative calculations are presented in the ChangeDetection database for the improved LOBSTER algorithm in this study. The noise in the foreground image of the improved algorithm is less than that of the ViBe and LOBSTER algorithms. The value of the improved algorithm is higher by 0.736% to 7.56% than the ViBe algorithm and higher by approximately 0.77% to 12.47% than the LOBSTER algorithm in terms of the PCC index. The value of the improved LOBSTER algorithm is less than 1% of the ViBe and LOBSTER algorithms in terms of the FPC index. Simulation results show that the improved LOBSTER algorithm performs better than the conventional ViBe model and LOBSTER algorithm in dynamic conditions.Thus,our method has a higher accurate rate and stronger robustness in foreground detection.
摘要:Pedestrian detection is a crucial research topic in computer vision and pattern recognition. Detection flows include preprocessing, feature extraction, training classification, and detection. Various human detection algorithms, which can be categorized as template matching and machine learning, have been developed in the past decades. Machine learning-based algorithms are the primary pedestrian detection method. The speed of machine learning, however, is problematic. Given the low detection speed of the classic DPM model, the current study focuses on star-casCade DPM (casDPM), which integrates PCA technology. The detection speed of casDPM is significantly higher than that of the classic DPM model. However, casDPM has a lower detection precision and higher log-average miss rate (LAMR)in pedestrian detection. Therefore, we proposed an improved pedestrian-detection approach based on the casDPM model to accurately detect pedestrians. Objectness proposals can be classified into grouping or window scoring methods. To produce a small set of candidate object windows, we utilized a binarized normed gradient method that trains a generic objectness measure. The set of generated features is called BING. Non-maximum suppression (NMS)is an important post-processing step. The common NMS is based on a greedy strategy that only utilizes area information and disregards the detection score generated by the model. Therefore, the following strategies are employed to address these problems:first, to obtain the confidence of regions with a low detection score in the casDPM model, object score is combined with candidate object area information, which is determined by the objectness measure. Windows with a confidence level above a given threshold are retained, which helps reduce negative windows. The score of detection windows is used to modify the original NMS algorithm, which only utilizes single area information in the casDPM model to reduce the high false-positive rate. We proposed a confluent cas-WNms-BING model that integrates the two methods to fully utilize the detection of window scores and candidate object proposed by objectness measure. We conducted tests to evaluate the performance of the proposed algorithm. Experiments on the INRIA dataset were conducted, and results were compared with those of the casDPM model. Results indicated that the average precision of the proposed model increased by 1.74%, the LAMR decreased by 4.45%, and speed increased by more than five-fold. These results indicated that the proposed algorithm is effective and has practical applications. Results showed that the proposed algorithm is applicable in actual pedestrian detection. The algorithm is robust against human deformation, complex background features, and occlusion. The algorithm also decreases LAMR and improves detection precision.
关键词:star-cascade DPM model;pedestrian detection;non-maximum suppression;object area
摘要:Multi-sensor image registration has three basic problems. (1)Detected feature points from multi-sensor images are insufficient. (2)The spatial distribution of detected feature points is unbalanced. (3)The matching result of the detected points cannot easily achieve high performance. A new CCD-IR image registration algorithm is proposed in this study to address the aforementioned problems, which includes an adaptive Harris corner detection approach and a new feature point matching measure function based on normalized mutual information and gradient orientation. In adaptive feature point detection, the quantity and spatial distribution of the feature point are as synchronous as the objective function. Different detection thresholds are then automatically assigned to the feature points according to their spatial position information. Moreover, normalized mutual information is combined with gradient orientation in the feature matching process to construct a new matching measure function. Experimental results show that the proposed adaptive feature detection approach can provide sufficient and uniform distributed feature points. The proposed mew matching measure function improved the point matching success rate by approximately 20%, whereas it decreased the registration error approximately 50%. This study proposes a novel CCD-IR image registration algorithm, which is shown have low complexity, accuracy, and practicability. Thus, it can be applied to CCD-IR image fusion application.
摘要:A new general equalization fuzzy C-means clustering algorithm that targets the shortcomings of existing, non-convergent types is proposed and applied in image segmentation. The proposed general equalization fuzzy clustering algorithm is also extended into the Hilbert reproduced kernel space. This approach can improve the universality of this algorithm class. The limit expression properties of the Schweizer T-norm are applied to construct the objective function of the new general equalization fuzzy C-means clustering based on the objective function of existing types. The Lagrange multiplier method is then adopted to obtain iterated formulae of the fuzzy membership and clustering center for the modified general equalization fuzzy C-means clustering. The iterative expression of the clustering center is modified to further improve the performance of the clustering algorithm. The modified clustering algorithm significantly improves a clustering performance class. Finally, a nonlinear function is adopted to map data samples from the Euclidean space to the high-dimensional feature space of Hilbert. The kernel space general equalization fuzzy C-means clustering algorithm is thus obtained. The kernel spaces general equalization fuzzy C-means clustering algorithms can improve the error classification rate of image segmentation by 10% to 30% compared with existing fuzzy compactness and separation (FCS)and fuzzy C-means clustering with local information and kernel metric (KFLICM)algorithms. Experimental results of the clustering analysis of Iris data and gray image segmentation indicate that the proposed general equalization fuzzy C-means clustering algorithm is efficient. Its modified algorithm can obtain more satisfactory clustering quality and segmentation effects than existing fuzzy c-means clustering algorithms. The proposed algorithm overcomes the shortcomings of existing general equalization fuzzy C-means clustering algorithms and improves the clustering performance, which is suitable for complex data analysis.
摘要:The quintic polynomial composite spline with parameters is presented in this study to overcome the disadvantages of the cubic parametric B-spline in shape adjustment and local aspects. First, a class of quintic polynomial basis functions with parameters is constructed. The quintic polynomial composite spline curves with parameters are then defined according to the same composite way with the cubic B-spline curves. The optimal parameter value of the quintic composite spline curves based on the energy optimization method is discussed. Finally, the corresponding composite spline surfaces are defined, and the problem of the optimal parameter values of the surfaces utilizing the particle swarm algorithm is studied. The quintic composite spline not only inherits most properties of the cubic B-spline, but also has stronger local and shape properties than the cubic B-spline. Given that the quintic composite spline is still a polynomial model, the equation structure is relatively simple, which is more in line with the requirements of the actual projects. The smooth quintic composite spline curves and surfaces can be obtained by the energy optimization method. The quintic composite spline overcomes the disadvantages of the cubic parametric B-spline in shape adjustment and local aspects, which is a practical method for free curve and surface modeling.
摘要:Hyperspectral data exhibit the characteristics of many bands and data redundancy. This study introduces the wavelet packet entropy feature in hyperspectral remote sensing classification. The new classification method of wavelet packet entropy feature vector angle (WPE-SAM)is defined based on WP coefficients, which are obtained by utilizing the optimal level WP decomposition of the spectral curve. An analysis of the WPE-SAM of four types of mineral spectra from the USFS library indicates that WPE-SAM can increase the distinction of different features. The Salina data is addressed by the WPE-SAM in feature space, the optimal deposition level is analyzed in the experiment, and the classification accuracy of WPE-SAM and SAM is also discussed. Experiment results show that the WPE feature has a better description of the original spectral feature. The WPE-SAM classification method is feasible, and the overall classification accuracy improved from 78.62% for SAM to 78.66% for WPE-SAM. The Kappa coefficient increased from 0.769 0 to 0.769 5, and the average classification accuracy from 83.14% to 84.18%. Classification results of the Pavia data also show that WPE-SAM has universal applicability. The WPE feature has a good description of the original spectral feature, such as reflectance peak and absorption valley. WPE-SAM can also increase the distinction of different features. Experiment results show that the WPE-SAM classification method is feasible, the overall classification accuracy and Kappa coefficients of WPE-SAM are higher than those of SAM, and WPE-SAM exhibits strong universal applicability. The accuracy and efficiency of WPE-SAM should be further improved.
关键词:entropy;wavelet packet sub band;hyperspectral image classification;feature extraction;optimal decomposition level
摘要:Remote sensing images of rivers are heterogenous and the intensities of the area inside the background are complicated. Drawing upon the traditional active contour models to segment the remote sensing images of rivers is ineffective. A hybrid active contour model with regional information fusion is proposed to improve the river extraction accuracy in remote sensing images of rivers to address this problem. The hybrid model combines the external region energy constraint terms of the CV model and the cross entropy-based active contour model. The different normalized ratio coefficients are incorporated into the above external region energy constraint terms. During evolution, both the variance and cross entropy information of the pixel grayscale values in the internal and external areas of the contour curve are calculated to guide the curve to approach the object edges. Moreover, the within-cluster absolute differences of the internal and external areas of the curve are incorporated to adaptively adjust the internal and external area energy weights instead of the original area energy weights and accelerate the hybrid model evolution. The proportion of the internal and external area energy can change continuously with the curve evolution, which ultimately improves the segmentation efficiency of the proposed model. Experiments are conducted on a large quantity of remote sensing images of rivers. The sensitivity of the proposed hybrid model is at almost 100 percent, which is similar to those of the CV model, geodesic active contours model, cross-entropy based active contour model, a hybrid model based on the CV and geodesic active contours models, and the LGIF model. Its accuracy is much higher than the other models, which exceeds 90 percent. False alarm incidents decreased by 50 percent. The iterative number and running time of the proposed model are also less than those of the other methods. This study proposes a novel hybrid active contour model with regional information fusion to achieve accurate river extraction in remote sensing images of rivers, which has a relative contrast between the area inside the background and area inside the river. The proposed model utilizes two area description criteria that are within-cluster variance and cross entropy to guide the curve evolution. This approach is more accurate than using a single-area description; hence, it can obtain better segmentation results. Moreover, the internal and external area energy weights of the proposed model can be adjusted adaptively, which leads to the rapid curve evolution. Extensive experiment results show that the proposed hybrid model not only obtains higher accuracy and sensitivity but also records fewer false alarms than the CV model, geodesic active contours model, cross entropy-based active contour model, a hybrid model based on the CV and geodesic active contours models, and the LGIF model. The proposed model has clear advantages in both segmentation performance and segmentation efficiency.
关键词:remote sensing image of river;image segmentation;regional information fusion;hybrid active contour model;within-cluster absolute differences;adaptive weights
摘要:The common obstacles faced by all types of automatic human body measurement systems are comparably high costs and large sizes with a complex installation process. This study proposes a human-featured automatic measurement system based on vanishing points (HuFAMS-VP). It consists of the following parts. HuFAMS-VP initially extracts a human body image, utilizing the background subtraction method followed by image contour detection. Several critical characteristic points on the contour can then be obtained by using the edge detection technique and proportion method. Finally, we combine the methods of vanishing points with proportion measure to obtain different human body features. HuFAM-VP is applied to measure the human body size, and the measurements are compared with the true sizes. Results show that it can achieve accurate results at less than±5 cm between the measured and real data. Moreover, the average CPU time is less than 2 s. The proposed system has the advantages of easy operation and low cost under the premise of high accuracy. Results show that HuFAM-VP demonstrated validity and robustness.
关键词:background subtraction;segment;vanishing point;feature identification;automatic body measurement
摘要:Hashing is an effective means for large-scale image retrieval. Preserving the semantic similarity in hash codes (i.e., the distance between the hash codes of two images)should be small when the images are similar to improve the retrieval performance. Conventional methods first extract the overall image feature and then generate a single hash code. Such methods cannot characterize the image content for multiple objects, which results in a low accuracy of multi-label image retrieval. This study proposes a new hash generation method with object proposals. We propose a new deep-network-based framework to construct hash functions that learn directly from images that contain multiple labels. The model first derives a series of interesting regions that may contain objects and then generates the features of each region through deep convolutional neural networks. It finally generates a group of hash codes to describe all the objects in an image. The compact hash code will be generated to represent the entire image. A novel triplet-loss based training method is adopted to preserve the semantic order of the hash codes. The image retrieval experiments on the VOC2012, Flickr25K, and NUSWIDE datasets show that the NDCG (normalized discounted cumulative gain)value of our method can be improved by 2% to 4% unlike DSRH (deep semantic ranking hashing)and 3% to 6% unlike ITQ-CCA (iterative quantization-canonical correlation analysis)on VOC2012. Our method can attain the improvements by approximately 2% on Flickr25 and 4% on NUSWIDE. Our method can obtain 2% to 5% on the Flickr25 and NUSWIDE datasets over the DSRH for the map evaluation. Thus, the new method can describe an image accurately in a fine-grained way, and the performance is improved significantly for multi-label image retrieval. This study proposes a new model to learn compact features, and experiment results show that the fine-grained feature embedding of an image is practicable. Thus, our method outperforms other state-of-the-art hashing methods in terms of image retrieval.
摘要:Image retrieval is an important task in computer vision. Image content description is the key to image retrieval. Accurate and full descriptions of the image content can significantly improve retrieval precision. Traditional methods describe image content by a unified fixed-length vector. A simple image only contains one object, whereas a complex image can contain several objects. Describing a complex image similar to a simple image by a fixed-length vector is generally insufficient. This study proposes a varying-length sequence description model. We propose the sequence description model based on the Recurrent Neural Network and Visual Attention Mechanism. The sequence description model describes images with varying-length sequences. The sequence description model first extracts low-level features by CNN (convolutional neural network), then generates a contextual representation of local features by intermediate LSTM (long short-term memory), and finally produces a vector group to describe an image by attention LSTM. The attention mechanism enables the vector number to describe images that are as many as the label number of the described image. The model is end-to-end trainable, and we train the sequence description model with label-level triplet loss function. We apply the Hungarian algorithm to compute the similarities between the two images. We also study the image retrieval precision with different deep multilayer LSTMs by changing the number of multilayer LSTMs. We performed the experiment based on two common datasets:MIRFLICKR-25K and NUS-WIDE. Our sequence description model method increased by 10 percent to 12 percent in terms of accuracy rate, unlike the DNN-lai method in the single-label image retrieval experiment on the MIRFLICKR-25K dataset. Our sequence description model method increased by approximately 10 percent over the CCA-ITQ and DSRH methods in the experiment on multi-label image retrieval on the NUS-WIDE dataset. We also provided comparative results of the performance of our method against the DNN-lai method. We applied the Hungarian algorithm to compute the similarities between two images, which consumed much time, given that our feature extraction results are varying-length. Thus, our method required a long time when querying an image in the dataset. This study presented a model utilizing a recurrent neural network to generate descriptive sequences of an image with attention LSTM. The proposed model was applicable to the task of multi-label image retrieval.
摘要:Border ringing effect is an important factor that influences the quality of motion-blurred image restoration. A deep analysis is performed to determine the causes of boundary ringing effects and improve the quality of image restoration. A new boundary ringing effects suppression algorithm with sine integral fitting is then proposed to suppress the ringing artifacts of the restored image caused by boundary truncation. First, the boundaries of the blurred image to be restored are extended according to the previously estimated blur kernel size. Second, window functions for one-direction and two-direction transition regions are calculated based on the sine integral and double sine integral methods, respectively. Third, the window function is applied to the extended image by multiplier operation. Finally, the windowed image is restored by restoration algorithms, and the original area is extracted as the restored result. Our proposed method is compared with several traditional algorithms that can lower the ringing effect. The proposed method in this study can suppress the ringing effect effectively in terms of visual quality. Peak signal-to-noise ratio (PSNR), normalized mean square error (NMSE), and image quality index (Q)are utilized to evaluate the image quality restored by our proposed method. The PSNR value of the image restored by the proposed method is 0.17 to 0.76 higher than that of the optimal window algorithm. The NMSE value of the image restored by the proposed method is 0.000 5 to 0.000 7 lower than that of the optimal window algorithm. The Q value of the image restored by the proposed method is 0.023 to 0.029 higher than that of the optimal window algorithm. Evaluation values of the proposed method are better than that of the cyclic boundary algorithm in most cases. Time consumption is applied to evaluate the efficiency of the proposed method. The processing time of the proposed method for the non-iterative restoration algorithm decreases by 0.04 seconds to 0.11 seconds, unlike the cyclic boundary algorithm. The time consumed for the iterative restoration algorithm decreases by several seconds. Experimental results indicate that utilizing the ringing effects suppression algorithm with sine integral fitting to suppress the ringing effect preserves the image edge completely and ensures less calculation, which is superior to other traditional methods.
关键词:image restoration;motion-blurred image;ringing effect;sine integral method;window function
摘要:Face alignment is one of the most active fields of computer vision. It attempts to localize facial semantic landmarks from a given face image, which is an important step in many face-related vision tasks such as face recognition and face beautification. Cascade regression-based face alignment algorithms have recently achieved state-of-the-art performance in both accuracy and speed. Cascade regression is an iterative method that refines an initial face shape through many linearly combined weak regressors. However, most previous methods focused on boosting the learning method or extracting geometric invariant features while ignoring the initial shape quality. This approach severely lowers their accuracy on complex scenarios, such as exaggerated expressions or extreme head poses. This study proposes a cascade regression-based multi-pose face alignment algorithm initialized with estimated initial shapes. The proposed method consists of two parts. First, the first derivative of Gaussian filter-based gradient difference features is extracted to represent the facial appearance, and a random regression forest is learned to predict initial face shapes. Second, these initial shapes are regressed by particular cascade regressors separately. The alignment error of this method decreased by 29.2%, 13.3%, and 9.2% in the COFW, HELEN, and 300 W databases, respectively, unlike existing methods. Experiments show that this method can eliminate the disturbance among different shapes for more accurate multi-view face alignment and run in real-time. This study proposes an algorithm for multi-pose face alignment based on cascade regression. This algorithm surpasses many state-of-the-art methods in terms of accuracy and is more robust for complex face shapes. The proposed initial shape estimation algorithm can generate initial shapes in suitable quality applied to improve existing cascade regression-based methods.
摘要:Soccer video shots and field areas are the necessary conditions for soccer video event detection, which plays an important role in the semantic analysis of soccer videos. A new fluctuation detection method is proposed to classify the soccer video shot to address the shortcomings of existing shot classification methods. This method adopts a sliding window to slide in the video frame image. The fluctuation times of the field pixel ratio that crosses the threshold ratio of the in-field shot are recorded, and then the shot type is judged according to the fluctuation number. A novel approach is also proposed to identify the playfield zone type by GMM (Gaussian Mixture Model), which is simple and efficient, by applying the position relationship between the upper left corner and upper right corner. Experimental results show that the two methods exhibit significant improvement in accuracy recall rate, high detection efficiency, and can meet the real-time requirements, unlike existing classification methods. The proposed methods can address the problem of the traditional sliding window method that incorrectly identifies the large frame image of the field inclination angle. These methods can also solve the problem of the traditional field area detection method relying on the field line detection, which leads to a low accuracy rate.