摘要:The pervasiveness of mobile cloud computing promotes increasing applications creating massive screen content data, such as video conference, remote teaching, and desktop virtualization, while these screen contents with high resolution require being transmitted to thin clients in real time. Thus, the cloud server requires an efficient coding algorithm with low complexity and high compression.Palette coding is one of the typical screen content coding methods satisfying the above requirements, which separates the screen content into a palette and an index map. The coding efficiency of an index map directly affects the overall compression performance of palette coding. However, when processing the indices in the gradient or conjunction area of foreground objects and text edges, the efficiency of the state-of-the-art predictive coding methods still requires being improved. Therefore, an index map prediction algorithm is proposed based on the Markov model. This study randomly selected 2 000 indices from those suffering from local prediction failure and divided them into three typical classes of distribution, of which the first two classes constituted more than 70%. These indices belonging to the first two classes of distribution located the smooth grayscale transitional area of an edge, in which an obvious linear change presented between the adjacent index values showed a gradual gradient from dark to bright or from bright to dark. This linear change led to the failure of the typical predictive algorithms. Under these circumstances, a one-order 2D Markov model is adopted to describe this linearity, and a Markov prediction algorithm of the index map of screen content is therefore proposed. Our algorithm consisted of three steps. First, the index values suffering from a directional prediction failure was selected to create a training dataset, in which the correlation coefficient and the color transition probability of the Markov model were calculated on. Second, when an index failed to be directionally predicted, the one-order 2D Markov model was used to compute the linear correlation among the neighboring indices to obtain its initial prediction. Third, the foreground objects and the text edges exhibited a specific color transfer pattern in the anti-aliasing region. A color transition probability was used to present the specific color transfer pattern. Thus, the color transition probability maximization method was used to determine the optimal value of the predicted index. Experimental results showed that the prediction accuracy of the proposed algorithm achieved 97.53%, which was on average 4.33% and 2.10% higher than those of the multi-stage prediction(MSP) method and the local directional correlation-based prediction method, respectively. The proposed method was particularly suitable for the index prediction of the video sequences with multiple text characters and geometric elements. Moreover, the computational complexity of the proposed algorithm was relative to that of the local directional correlation-based prediction method and was significantly lower than that of the MSP method. In particular, the actual running time of our algorithm was 95.08% less than that of the MSP method and increased by 35.46% compared with that of the local directional correlation-based prediction method. The proposed index prediction algorithm based on the Markov model increased the prediction accuracy by exploiting the linear correlation and the special color transition mode of the indices in the edge area while maintaining low computational complexity. The proposed algorithm could be applied in the palette coding of text/graphic blocks in the screen content. The conclusion of this study verifies that the prediction efficiency of the index map can be improved effectively by using the Markov property of index. This algorithm uses only one key frame to train the parameters of the Markov model to ensure low computational complexity, considering that the screen content usually presents higher temporal redundancy than the natural video. However, this method may also relatively affect the accuracy of the trained parameters. Simultaneously, this study uses all of the indices suffering from prediction failure to train the parameters of the Markov model and the color transition probability without additional operations to evaluate whether these indices belong to the first two classes of distribution. If a simple and efficient classification method can be designed, then obtaining accurate model parameters is expected. In addition, this study only addresses the prediction issue of the indices in the first two kinds of distribution. However, given that the indices in the third class of distribution do not present obvious local correlation, these indices cannot be effectively predicted by our proposed Markov model. For these indices, template matching is an optional method that can be used to explore the global color transfer pattern in the edge transition region and thus to realize their non-local prediction.
摘要:Non-photorealistic rendering is a technique which is committed to generate art style works. The goal of this technique is not only to represent the authenticity of graphics but also to simulate their art features, as well as their defects. This concept was first proposed in the 1980s. After years of development, non-photorealistic rendering techniques can simulate a numerous painting styles, including oil paintings, watercolor paintings, Chinese ink paintings, pencil sketches, and cartoons. The sketch is a kind of painting which presents the light and shade of an object by lines. The color sketch is a sketch painted by a small number of color pencils. Except for the light and shade, the color sketch represents color features of an object as well. When painting a color sketch, people follow the monochrome sketch and render different regions with disparate color pencils. People usually need to blend two or more colors to obtain the desired color because the types of pencils are limited. In addition, the color mixing of sketches differs from that of oil or watercolor paintings. Typically, people mix pigments on a palette. However, when doing a color sketch, they directly draw disparate color layers on the paper. The layers of stroke for each color are superimposed and interspersed to build the target color through optical blending. This feature creates a very special kind of texture and color style for the color sketch. Although many people are fond of sketches, mastering the skill of pencil drawing is not easy for everybody. Based on previous works and characteristics of color sketch, combining the duotone color reproduction with LIC techniques, we realized an improved way of generating color sketch. First, we fulfilled a monochrome sketch simulation process based on LIC(line integral convolution). On this basis, we realized a new approach for the color sketch generation based on color customization, which can automatically convert a digital image into a colored pencil drawing, by researching on the features of the color sketch and combining with the duotones color reproduction technique. The first step of this method is image segmentation. We use a primary color, as well as a secondary color, to paint each segmented region. Our method can automatically calculate the color set of each particular region. Such kind of color customization can successfully simulate the color mixing of sketch paintings, and it can also be expanded to the simulation of similar painting styles, such as chalk drawing and pastel drawing. In addition, we propose a way to superpose the real paper texture into digital images, which will make the final effect closer to real paintings. We achieve an ideal way to generate improved profile for the color sketch based on neon-transformation because the profile is one of the most important elements. We zoom the color of the original image before generating the white noise, which generates a clear hierarchy for the final effects, to improve the contrast ratio. We also fulfill a new way for image segmentation based on K-means cluster, which obtains suitable segmentation results for the color sketch generation. To validate the effectiveness of the proposed approach, we conducted extensive experiments. Experimental results demonstrate that our approach outperforms several state-of-the-art approaches on challenging automatic and real-time transformation from color image to color painting, as well as bridging the gap between them. In contrast to Yamamoto's method, the proposed technique incorporates the idea of hierarchical method and color scale transformation into linear integral convolution with two-tone color sketch mapping technology. As such, we obtain sketch texture closer to a hand-painted effect. Matsui's method can generate a soft texture. However, the entire stroke fusion process is really complicated and time-consuming. Particularly, this method does not consider the sketch contours and could be error-prone to lose structure characteristics of the object in drawing effect. Kim's method extracts basic image feature for sketch generation. This method also has high efficiency and can be well applied to mobile devices. However, this method only considers the contour perspective, thus ignoring the sketch texture effect which often suffers a decrease in accuracy compared to real color sketch. In the comparison of the two approaches, the proposed method considers contour and texture. Simulating color image from the contour and texture aspects helps in issuing object characteristics preservation and preventing potential unrealistic sketch transformation. Way's method can generate a soft and saturated but cluttering texture, thus missing the natural line sense of color sketch. We calculate texture direction by transforming image space to frequency domain for energy statistical characteristics to tackle this issue. The simulation results obtain obvious intensive line sense. In the color fusion aspect, Way's method is based on the color of the original image and only overlays strokes texture on the basic color. However, in the real drawing process, people can only use limited colors. We have established the library of basic colors to dispose this challenge. All colors used for sketch simulation are fused from the library and by two basic color layers. Simulation results show robustness to changing imaging conditions, as well as satisfactory human visual sense. In general, our study reveals new insights into the drawing scheme, implementing method, algorithm complexity, and interactivity when simulating color sketch. First, the proposed method is accomplished using the image transformation approach. Therefore, the method can interactively transform color image to color sketch or any other basic color specified by the user. Second, our study encompasses contour and texture simulation, facilitating the simulation of a realistic painting process. Third, the proposed method finds inspiration from human drawing experience. We overlay and fuse different color layers to generate color texture. This way can also be employed to simulate a wide spectrum of painting style, such as chalk drawing and crayon drawing.
关键词:non-photorealistic rendering;color sketch;duotone mapping;line integral convolution
摘要:Rendering technology is used in many industries, including home decoration, to improve the visual effects of images or videos. Designers use this technology to produce real and attractive designs. The quality of a rendered image of home decoration design relies on the quality of design parameters, such as sharpness or colorfulness, which depend on the renderer. The rendered images usually have few problems with sharpness because they are always partially gray due to the fixed parameters of the renderer and the multiple parameters of complex light. As such, manual optimization is necessary; this process requires a considerable amount of time and energy of the designer and cannot avoid subjectivity. Few enhancement methods have been developed to enhance the quality of rendered images but cannot yield acceptable results. In this paper, an adaptive enhancement method is proposed to deal with the rendered home decoration design images; the method combines three image enhancement elements, including brightness, contrast, and saturation by the neural networks. The proposed method combines different algorithms to improve the three elements of image enhancement and uses the neural networks to learn the subjective parameters of the rendered images. First, the image enhancement method based on the saturation of an original image is used to enhance the brightness and contrast of the image, wherein the saturation component of the color image is computed in the HSI color space. Two different exposed images are generated using weighting function method. The enhanced image is obtained by fusing the original image and the two exposed images. However, contrast enhancement is still insufficiently strong because the algorithm is mainly aimed at enhancing the brightness. Therefore, the histogram equalization is added in to further enhance the contrast because it is simple to understand and calculate. In many conditions, histogram equalization will not produce ideal results due to the noises in images, while this problem is disregarded in the rendered images. In consideration of the similarity between brightness and contrast, the two algorithms are fused with two enhancement factors. Finally, a color matrix is used to enhance the saturation in the RGB color space. The saturation enhancement is also provided as an enhancement factor to be fused with the brightness and the contrast because of the connections among them. The nonlinear mapping relation between the mean and variance of the brightness, the mean and variance of saturation of the original image and the enhancement factors of brightness, contrast, and saturation is established based on the neural networks. Conventional image enhancement methods cannot acceptably and automatically enhance the rendered images because the features of the rendered images are special and the enhancement needs to meet human visual system(HVS) as well. In the proposed method, the enhancement factors are automatically determined according to the neural networks established on three different algorithms to realize the adaptive enhancement of the rendered images. The effectiveness of the proposed algorithm is verified on few rendered home decoration design images that are partially gray in different degrees. These experimental images are all designed by the same designer in case of few unnecessary errors. The proposed method is also compared with few classical image enhancement algorithms by the histogram, information entropy, average contrast(AC), and average gray(AG). Experimental results present that the histograms of the processed images have very few information loss and maintain features very well. In addition, information entropy, AC, and AG of the processed images considerably increased compared to those of the original images. Compared to the other methods, the proposed method can achieve a higher increase in degree on these quantitative evaluations. Experimental results show that the proposed method can effectively enhance the rendered images adaptively. Experimental results show that the proposed method can properly enhance the brightness, contrast, and saturation of the rendered images with different degrees, which proves its suitability for the enhancement of partially gray rendered images. Furthermore, the proposed method is also easy and fast to compute. However, we cannot deny that this method has few limitations on the conditions wherein the materials are more reflective than normal situations. In addition, the sunlight is simultaneously very strong, resulting in extremely colorful and unreal reflective materials. Hence, the method should be further studied to adapt to few extreme situations.
关键词:rendered home decoration design images;adaptive enhancement;enhancement factors;neural network;quantitative assessment
摘要:Fog results in degradation of contrast and color saturation in the images shot outdoor. Therefore, the visibility of objects will decline, and details will be difficult to recognize. Thus, robust defogging techniques are valuable in the industrial fields driven by outdoor images or videos. Currently, the mainstream defogging methods are based on the foggy image degradation model, and two main problems that come from the estimation of atmospheric lights and transmissions remain. First, sky regions do not comply with the dark channel prior. The atmospheric light, which suffers from the interference of sunlight in the images with large sky area, is overestimated, resulting in gloomy sky regions in the fog-free images. Second, halo effects require being eliminated through a necessary transmission refinement process. However, the existing refinement methods will result in unreasonable transmissions with texture-like fluctuations inside the same planar objects, causing the inconsistency of variation trends between the transmission map and the depth information. Furthermore, these transmission fluctuations exert a negative influence on the contrast enhancement in the non-sky regions, which conform to the dark channel prior. To address the abovementioned problems, we combine sky detection with texture smoothing and propose a new image defogging algorithm. First, we design an adaptive atmospheric light estimation strategy based on sky detection, which can avoid gloomy restoration results of the sky regions, to address the overestimation of atmospheric lights in the images with sky regions. Initially, the foggy images are classified according to whether these images encompass sky regions. For the foggy images with sky, the pixels within the sky regions are sorted by their luminance values. The atmospheric light values are estimated according to the sky pixels with low luminance values. The problem of overestimated atmospheric light can be overcome through this strategy, resulting in bright and clean sky regions without color distortion phenomenon. For the foggy images without sky, a SVM-based atmospheric light validation strategy is adopted to avoid the interference of highlight objects and estimate reliable atmospheric light values. Second, we suggest a precise transmission calculation strategy based on patch shift and guided filter to solve the problem of insufficient contrast enhancement for the non-sky regions. At the first step, the input images are preprocessed by texture smoothing to simultaneously suppress the unnecessary texture details inside the same planar objects and maintain the necessary boundary information among the different objects. Therefore, the smoothed input images can maintain color consistency inside the same planar objects. Next, rough transmissions are estimated through the patch shift mechanism to restrain the halo effects, which are then refined with the smoothed input images by the guided filter. Thus, the refined transmissions are consistent with the trend of the depth variation, which is beneficial in promoting the contrast and color saturation of the fog-free images. Third, the fog removal results are post-processed with the original input images by the joint bilateral filter to prevent the negative impact from the noise inside the area with low luminance. For the foggy images with sky regions, relatively lower atmospheric lights can be obtained by our adaptive atmospheric light estimation strategy, and the problem of overestimating atmospheric lights is conquered. Therefore, the restoration results of the sky regions that do not comply with the dark channel prior are bright and clean, without color distortion phenomena. For the foggy images without sky regions, the selected locations of the atmospheric lights will avoid the highlight objects and lie in the regions with highest fog density by using the SVM-based atmospheric light validation strategy. Second, the texture details in the preprocessed images are suppressed fully through texture smoothing, and the pixel color information within the planar objects maintain homogeneous and continuous. The refined transmissions with our precise transmission calculation strategy hold high consistency with the depth variation under the guidance of the texture-smoothed images, and the restored non-sky regions that conform to the dark channel prior present high contrast and color saturation. Moreover, the noise in the fog-free images is reduced after the joint bilateral filtering process. For the gloomy sky regions derived from the exceptions to the dark channel prior, reasonable atmospheric lights are concerned with our method, which is different with the previous strategies that split the sky regions and process them alone. Experimental results show that our estimated atmospheric lights have relatively lower values, and the sky restoration results are bright and clean. For the insufficient contrast-enhanced non-sky regions because of texture details in the original input images, the guidance of transmission refinement is preprocessed through texture smoothing, which is different from the direct transmission refinement strategies used in the previous methods. Moreover, our estimated transmissions are consistent with varying depth information, which contribute to the contrast and color saturation enhancement. Compared with the existing image defogging methods, our fog removal results have higher contrast and color saturation with natural sky regions, which are qualified in outdoor application fields, such as video surveillance, traffic regulation, and object detection. However, some excessive color saturation enhancement will lead to local color distortion phenomena, thus a reasonable and practical control strategy for the color saturation enhancement is expected as our next research goal.
摘要:Fine-grained classification has gained increasing attention in recent years. The subtle differences among categories remain challenging and could be addressed by localizing the parts of the object. This step requires a considerable amount of manual work. In this regard, bilinear convolutional neural network(B-CNN) models have been established using two feature extractors to represent an image. B-CNN only needs image labels and yields accurate results. However, B-CNN cannot distinguish confusing categories because the networks and classifiers are trained on all training images. We propose a hierarchical B-CNN model guided by classification error to combine confusing categories. We then retrain and reclassify the categories. The model can distinguish the categories and improve the classification accuracy on the fine-grained classification targets. This work mainly aims to retrain and reclassify confusing categories. First, we propose a clustering algorithm guided by classification error to obtain clusters containing frequently misclassified categories. The algorithm is established using constrained Laplacian rank(CLR) method, and the "affinity matrix" is constructed by the "classification error matrix." Considering that the labels of the test images are unknown, we conduct experiments on validation images. The classification error matrix is obtained by comparing the classification results and the real labels of the validation images. Second, we propose a new hierarchical B-CNN model. In the first layer, the networks and classifiers are trained on the entire training sets, and the test images are preliminarily classified. In the second layer, the networks and classifiers are trained on each cluster, and the test sets are reclassified. We select three datasets, namely, CUB-200-2011, FGVC-Aircraft-2013b, and Stanford-cars. First, we train the networks and classifiers on the entire training sets and obtain the classification results of the validation images. The datasets of CUB-200-2011 and Stanford-cars do not have validation sets; as such, a part of the training set is randomly assigned as the validation set. We obtain the "classification error matrix" by classification error of the validation set. The matrix comprises two columns designated for the classification result and the real label. Second, we construct the "affinity matrix" of size × for the dataset containing c categories, where() refers to the frequency at which the samples of the ith category are misclassified to the jth category. We also normalize the "affinity matrix" to obtain improved cluster results. The entire samples are divided into different groups by using the CLR algorithm. Each group contains only several categories that can be easily classified from one another. Finally, we extract training and testing sets by groups for retraining and reclassification. We retrain the convolutional neural networks and the SVM classifiers on each group of the training set. We re-extract the features of the corresponding testing set and reclassify them. We conduct other experiments to verify the effectiveness of the proposed algorithm. First, we retrain only the SVM classifiers without retraining the convolutional neural networks for simplicity. Second, we retrain the SVM classifiers guided by the distance of the features instead of the classification error. The classification accuracies of the single B-CNN model for CUB-200-2011, FGVC-Aircraft-2013b, and Stanford-cars are 84.35%, 83.56%, and 8945%, respectively, which increase to 84.48%, 84.01%, and 89.66%, respectively, after retraining the SVM classifiers and reclassifying the test samples guided by the classification error; moreover, the accuracies increase to 84.67%, 84.11%, and 89.78%, respectively, when the hierarchical B-CNN model is used. However, the accuracy of the results after retraining the SVM classifiers and reclassifying the test samples guided by the distance of the features is lower than that obtained using the single B-CNN model. Experimental results show that retraining SVM classifiers guided by classification error and retraining the networks can improve the classification accuracy. The accuracy of the result obtained with the distance of the features is low. In B-CNN models, the networks and classifiers are constructed based on all training samples, resulting in confusing categories that are difficult to classify. In this paper, we propose a new hierarchical B-CNN model guided by classification error. In this model, clusters of the confusing categories are collected together. We retrain and reclassify each cluster to distinguish confusing categories. The classification error matrix is directly related to the classification problem and can provide higher classification accuracy than feature similarity. The experiment results based on the three datasets confirm that the proposed model can effectively improve the classification accuracy but requires a considerable amount of time. The model is suitable for fine-grained classification tasks, especially when dealing with similar targets. In our future work, we will develop our model in terms of two aspects. First, the number of clustering subsets after one cluster operation is relatively high, and the subsets can be further clustered. We will attempt to deepen the model from two to more layers. Second, this paper adopts clustering method for automatic selection. Other effective methods will be explored in our future studies.
摘要:Edge detection is one of the most fundamental operations in image processing and scene analysis systems because edges form the outline of an object. Edge detection is the procedure of detecting meaningful discontinuities of the image function and providing an effective means for image segmentation, image fusion, and pattern recognition. Gray image edge detection has been developed with relative saturation; however, the color image edge detection has not received the same attention. Up to now, most of the existing color image edge detection algorithms are monochromatic-based methods, which produce a superior effect than traditional gray-value methods. Both methods do not completely utilize the chromatic information. Meanwhile, vector-valued techniques treat the color information as color vectors in a vector space provided with a vector norm, thus solving such a problem. However, the vector-valued methods have high complexity and large computation. A color image with three components can be represented in quaternion form as pure quaternions, which can well preserve the vector features of the image pixels. Consequently, the edge detection algorithm combining the smallest univalue segment assimilating nucleus and quaternion in RGB space was proposed to deal with several problems in the traditional color image edge detection methods, such as the insufficient use of chromatic information in color images, the large amount of time and space consumption in the process of nonlinear transformation between color models, and the complex algorithm implementation. For a preferable color image edge detection result, we consider the algebraic operation and spatial characteristics regarding the quaternion, as well as the simple and effective edge detection performance of the SUSAN algorithm in our method. The steps of this approach can be summarized as follows. First, the color images are represented with pure quaternions and normalize each pixel. Second, edge detection is performed using the SUSAN operator, which generates a thick edge because of the constraint of the fixed geometry threshold ; hence, the Otsu algorithm is applied to adaptively capture the double geometry thresholds. Third, we make the edge growth on the weak edge set and determine the local edge direction according to the center of gravity and the longest axis of symmetry of SUSAN. Finally, we perform the local non-maximum suppression operation to obtain the final thinned edge image. Three classic color images and a synthetic color image with four blocks for specific colors are selected to make a comparison with other edge detection algorithms, including the color Canny algorithm, the SUSAN algorithm, and our method with a fixed threshold to demonstrate the effectiveness and robustness of our method. Two different forms of the geometric threshold are established in our method is to verify whether the selection of the threshold is influential to the final effect of the edge image. We used the Pratt quality factor to conduct a quantitative evaluation of edge positioning accuracy. Experiment results show that our method with less lost edges can detect the edges of different color regions with similar brightness, and the extracted edges are continuous and meticulous. In addition, for the color images with weak noise, our method is very robust that it still can effectively detect the real edge points. Compared with the color Canny algorithm which has the preferable effect of edge detection in color images, the quality factor of our method improved by 0.012 0, and the operation time was reduced by 2.527 9 s. In this thesis, we proposed an edge detection algorithm combining the smallest univalue segment assimilating nucleus and quaternion, realizing the effective fusion of quaternion and the SUSAN operator. Setting several different comparative experiments, subjective and objective evaluations show that our method effectively suppressed the weak noise and improved the accuracy of edge localization, which is really suitable for low-level color image processing with low demand in real-time.
关键词:edge detection of color image;quaternion;SUSAN operator;Otsu algorithm;local non-maximum suppression
摘要:Currently, the efficiency and effectiveness of several existing segmentation algorithms on images with rich texture information are not satisfactory, especially at the object edges between different texture areas. Texture information will disrupt the performance of traditional segmentation algorithms. However, the textures prevail in major images, such as the grass, trees, sky, and sea. The segmentation of texture images without losing edge information becomes crucial. To solve this problem, this paper proposes a novel anisotropic image segmentation algorithm based on continuous texture gradients to reduce the influence of texture areas. An improved watershed(WS) algorithm is proposed to preserve accurate edge data. The benefits of using the WS algorithm are the feature of edge preservation. However, the texture in the different areas of images will reduce the accuracy of the WS algorithm segmentation. Inspired by the traditional anisotropic filtering approaches, this paper presents an improved anisotropic texture gradient, which could detect a different area with a different texture and preserve the sharpness of texture edges. The combination of the WS algorithm and the anisotropic texture gradients maintains the completeness of the edge information among the different texture areas without losing the effect of the segmentation of texture areas. We extend the single-channel color computation of the WS algorithm to three channels(RGB) for improved performance on color image operation. The proposed method maps the altitude information into the continuous texture space by using log function to manage the data sensitivities of texture features. Therefore, the continuous texture space is beneficial for water widget judgment and has good performance on the over-segmentation because of applying the proposed texture anisotropic algorithm in the continuous texture gradient space. We conduct experiments to compare the texture segmentation results with other classical filter operators, such as Gaussian and mean filters and to compare the two newest segmentation approaches on the segmentation accuracy. The datasets used are the BSD500 and the Stanford background datasets, and we select numerous images with rich texture information. After the experiments, the edges filtered by Gaussian and mean filter are all blurred, and thus it is difficult to perform segmentation. However, our proposed anisotropic texture filtering method performs well on edge preservation while removes texture information from object images. Simultaneously, the results indicated that the proposed algorithm has obvious advantages on accuracy and edge conservation compared with the newest approaches. We compute the average accuracy of segmentation to evaluate the results quantitatively. The average accuracy of our method has reached nearly 90.9%, obviously exceeding other recent algorithms. This result verifies the effectiveness of our method. The proposed algorithm has advantages of edge preserving and high accuracy with less over-segmentation. The feasibility of the algorithm is proven through comparison with the latest segmentation algorithms. Our method is suitable for images with rich texture information, such as natural and artificial textures.
摘要:Video stabilization is one of the key research areas of computer vision. Currently, the three major categories of video stabilization algorithms are 2D global motion, 2D local motion, and feature trajectory stabilization. The 2D global and local motion stabilization algorithms usually cannot achieve a satisfying stabilization result in scenes with nonplanar depth variations. By contrast, the feature trajectory stabilization algorithm handles nonplanar depth variations well in the aforementioned scenes and outperforms the two video stabilization algorithms. However, the feature trajectory stabilization algorithm normally suffers from stabilization output distortion and unstable local result because of its drawbacks in the trajectory length, robustness, and trajectory utilization rate. To solve this problem, this paper proposes a feature trajectory stabilization algorithm using trifocal tensor. This algorithm extracts real feature point trajectory in the scene of the video with KLT algorithm and leverages the RANSAC algorithm to eliminate mismatches in the tracking feature point. The algorithm then adaptively selects a segment of the real trajectories to initialize the virtual trajectories based on the length of real trajectories. A long virtual trajectory is constructed by applying a trifocal tensor transfer to extend the initial virtual trajectory. This virtual trajectory extending process stops when either the virtual trajectory exceeds half of the frame width or height, or the difference between the mean and median of transferred points is larger than five pixels. When the number of virtual trajectories through one frame is less than 300, new initial virtual trajectories are added using the real trajectories on the same frame. With the acquired long trajectories, the algorithm odd-extends the beginning of the virtual trajectories to the first frame and the ending of the virtual trajectories to the last frame. The stabilized view is defined by the smoothed virtual trajectories from the output of the FIR filter. To smoothen the real trajectories, the algorithm re-projects real feature points to the stabilized views by the trifocal tensor transfer and divides the original frames into 16×32 uniform meshed grids. The final stabilized frames are rendered using mesh grid warping conversion of the original frames, while the input to the mesh grid warping is the smoothing vectors between the real feature points and the smoothed real feature points. For the smoothing vectors with unneglectable error, the proposed algorithm deletes them to guarantee the output of mesh grid warping by a combined usage of discarding the smoothed trajectories at most five frames long and the RANSAC algorithm based on the affine model. The degraded precision of virtual trajectory construction and real feature point reprojection is observed because of the degeneration of trifocal tensor transfer, and this algorithm adaptively changes the size of the transfer window according to the severity of degeneration. This process guarantees sufficient transferred points acquired to guarantee the precision of virtual trajectory construction and real feature point reprojection. In the construction of virtual trajectories, this algorithm marks the previous frame as a breakpoint to process the partitioned video when the number of virtual trajectories through one frame is detected at 25% less than the previous frame. Thus, the proposed algorithm achieves enhanced stabilization result. The experiment on a number of videos of different types shows the proposed algorithm has advantages in the video stabilization result over the traditional feature trajectory stabilization algorithms that are based on the feature trajectory augmentation or epipolar point transfer and the commercial software, Warp Stabilizer. The testing videos are classified into categories, including "simple," "running," "rolling shutter," "depth," and "driving," when compared with the stabilization algorithm based on the feature trajectory augmentation. The "simple" videos have relatively slow camera motions and smooth depth variations. The "running" videos are captured while the users are running, thus these videos are challenging because of excessive wobbling. The "rolling shutter" videos suffer from noticeable rolling shutter distortions. The "depth" videos have a significant abrupt depth change. The "driving" videos are captured on moving vehicles. Furthermore, when compared with Warp Stabilizer, the classification is slightly changed to include "simple," "lack of long trajectory," "rolling shutter," "depth," and "driving." The "lack of long trajectory" videos lack long trajectories because of possible reasons, such as camera panning, motion blurring, or excessive jitters. To compare the stabilization results of the three algorithms, a scoring system is used to evaluate their stabilization outputs and then statistically analyze the results of each category to demonstrate the performance of the proposed algorithm. indicates that the proposed algorithm requires a reduced trajectory length and achieves high trajectory utilization rate and good robustness. When compared with the algorithm based on feature trajectory augmentation, the stabilization results have fewer distortions and better stability for 92% of the "running" videos; both algorithms have similar stabilization result for nearly 50% of the "rolling shutter" videos; and the proposed algorithm has fewer distortions for 38% of the videos in this category. Both algorithms have similar stability and no distinct distortion for 55% of the "simple" videos; the proposed algorithm can achieve better stability for the remaining 45% of the videos in this category. Meanwhile, both algorithms have similar stability and extent of distortion for most of the "depth" and "driving" videos; the proposed algorithm has slightly improved stability for few "depth" videos. When compared with Warp Stabilizer, the proposed algorithm has fewer distortions and better overall effect for 93% of the "lack of long trajectory" videos and 71.4% of the "rolling shutter" videos. For the "simple" and "driving" videos, both algorithms achieve good stabilization result. Both algorithms achieve a similar result for 75% of the "depth" videos; for the remaining 25% of the videos in this category, the proposed algorithm has fewer distortions. When compared with the stabilization algorithm based on epipolar point transfer, the proposed algorithm has fewer degenerated situations and therefore can avoid distortion introduced by phased motionless camera or pure camera rotation. The proposed algorithm has less restriction on the camera motion pattern and scene depth and is suitable for common video stabilization situations, including scenarios that lack parallax with the nonplanar structure or with rolling shutter distortion. The proposed algorithm can still achieve satisfying stabilization result in scenarios that lack long trajectory because of camera panning, motion blurring, or excessive jitters. The time complexity of this algorithm may require improvement because this algorithm requires approximately 3-5 s per frame on a machine with a 2.1 GHz Intel Core i3 CPU and 3 GB of memory. In the future, parallel computing may be a potential solution for increasing speed.
摘要:The performance of current machine vision is inferior to that of human vision. Simulating human visual mechanism can improve existing algorithms. The human visual system can detect objects with high acuity and focus its attention on a region relevant to the current visual task. These advantages are all attributed to the visual attention mechanism. Humans accept attention by making a series of eye movements. Eye movement has two forms: saccades and microsaccades. 1) In the saccades stage, the human eyes aim to find a candidate object, thereby sharply shifting in the entire field of view. 2) While candidates are identified as a target, the eyes will make a series of dense tiny movements called microsaccades around the target to intensify objects and inhibit noises. Continuous microsaccades will lead to visual fading, and the eye movement will switch to the saccades stage to find new objects. The integration of saccades and microsaccades contribute to the rapid and efficient performance of the human vision system. This paper presents a novel saliency detection framework by simulating microsaccades and visual fading. The constructed positive feedback loop focuses on a fixation area and intensifies objects to provide saturation of visual perception that leads to visual fading. In this loop, multiple random sampling of the gaze area is used to simulate the behavior of microsaccades, and random vector functional link networks(RVFL) are utilized to simulate the human neural system to produce binary visual stimulus. The proposed framework is totally data-driven and does not require any prior knowledge and labeled samples. First, the conventional saliency detection methods could be used to produce a variety of saliency map. We group these saliency maps to an integrated saliency map to simulate multi-channel visual perception. The integrated saliency map can be subjected to further thresholding to form an initial fixation area. The following multiple random sampling could be executed from the pixels in the fixation and non-fixation area. The ensemble of RVFL is trained on-line by those samples of the pixel. The RVFL model could be used to classify image pixels to obtain a new fixation area(binary area). For the new fixation and non-fixation areas, iterations of "sampling-learning(modeling)-pixel classification" could be performed on-line. If the fixation area is unchanged in the iteration, then this indicates that the perception is saturated and that the iteration should be terminated. When obtaining a binary result of pixel classification as a kind of visual stimulation, the output of multiple visual stimuli could be accumulated to generate new image saliency map. The last binary result of pixel classification in the positive feedback loop could be regard as a foreground of segmentation. Three popular image databases, namely SED2, MSRA10K, and ECSSD, were chosen to evaluate the performance of our algorithm. These databases contain a total of 11 100 nature images with different salient objects and scenes. Every image in the dataset was finely labeled manually for saliency detection and image segmentation. Five other models were compared, including the state-of-the-art or closely related models to our approach: BL, RBD, SF, GS, and MR. P-R curve, F-measure, and MAE were used to illustrate the performance of the algorithm in six algorithms on three databases. Experimental results show that our method has the best performance in SED2(two objects) and MSRA10K(single object). Our method is inferior to BL and relatively close to RBD in the ECSSD(complex scene and multi-object) database, while it is better than the rest compared to the other algorithms. The performance of BL, RBD, SF, GS, and MR. can be effectively improved by adding learning-based positive feedback in SED2 database.Experimental images illustrate that the new method is consistent with the visual saliency map of human perception by positive feedback and visual stimulation accumulation. From the view of qualitative evaluation, the binary result detected by our method is clearly closer to the ground truth than others. The positive feedback iteration could be rapidly saturated, and the running time of the algorithm is insignificantly increased. This result can be treated as an effective post-processing modular, which could improve the performance of the conventional saliency detection algorithm. This paper proposes a novel saliency region detection method based on machine learning and positive feedback of perception. Motivated by the human visual system, we construct a framework using an RVFL to process visual information from coarse to fine, form a saliency map, and extract salient objects. Our algorithm is totally data-driven and does not require any prior knowledge compared with the existing algorithms. Experiments on several standard image databases show that our method not only improves the performance of the conventional saliency detection algorithms but also successfully segments objects in different scenes.
摘要:In a conventional augmented reality system, the multi-scale image representations of a template image are constructed first. Feature key points at each scale are extracted and put together as a template feature set, which is used to match with the feature points extracted from the camera images. The number of feature points of the template image would become large when the number of scales in the template image representations is large. Nevertheless, camera images only correspond to images within a scale range similar to the scale of the camera image, and they probably overlapped with these images in partial regions. This result means that an ample amount of useless computation exists in conventional feature matching algorithms, thereby simultaneously lowering the image matching speed and decreasing registration accuracy. This paper proposes an effective method to locate image matching scales and regions in camera pose tracing and solve the preceding problem mentioned. Using local feature patching between current camera image features and the corresponding image scales and features of template image pyramid of regions achieves real-time computation of camera pose by feature matching pairs to solve feature matching accuracy and efficiency problem of the traditional three-dimensional tracing method. In the preprocessing stage, scale-space layers of a template image are constructed first. Concretely, an image is obtained by down-sampling the original image by a factor of 1.5, and it is sequenced as the second layer. On the condition that image resolution at the maximum layer is only less than that of the screen image specified, the other layers are formed by progressively half-sampling the original image and the second layer image and putting the two sequences alternately. Secondly, the key frame structure for each layer image is built. Specifically, each layer image is partitioned into the same rectangular regions, which could be overlapped when necessary. The size of the rectangular region is selected similar to that of the layer image at the maximum scale in scale-space layers. In each region, feature points are extracted and binary descriptors are generated by using the oriented FAST and rotated BRIEF algorithm, putting every rectangular position, sub-image, and feature points within it together to form a key frame structure. By this way, the feature descriptors of the image pyramid are managed according to scales and regions. In the real-time tracking stage, the scale range for any camera image within the image pyramid is located first. The covered image regions within this scale range are found using defined overlapping degree rules, thereby decreasing the scope of feature matching between current camera image features and template image pyramid and improving feature matching accuracy and efficiency by using local feature matching. 1) In locating scale range, a camera image, which is obtained in a distance to a template image, essentially corresponds to a scale range in the image pyramid of the template image and overlaps with some image regions in the scale range. This paper suggests a method for locating the scale range. First, this method predicts current camera pose in two ways: using the last frame camera pose and predicting the pose by Kalman filtering; four vertices of the original image are projected on the screen image with the evaluated camera pose; finally, the projection area size is obtained and used to compare with the layer image sizes in the image pyramid to determine the scale range. 2) In calculating the degree of region overlapping, we project all their key frame regions in layer images within the scale range onto the screen image with the evaluated camera pose to calculate the areas of the overlapped regions; the region overlapping degree is calculated through our method. 3) In local feature extraction and matching, a number of key frames with a large region overlapping degrees are obtained from the camera image by using the last frame camera pose as the evaluation; other key frames are obtained similarly by using pose evaluation from Kalman filtering. We consider the union of the two key frame sets and match all their feature points with those extracted from the camera image through the ORB algorithm and compute the camera pose by using some matching pairs. The new algorithm is implemented and run on a smartphone, tested on an open image database(Stanford mobile visual search dataset) with different resolution images and on other template images. This new algorithm is compared with four advanced algorithms, namely, fast locating of image scale and area, ORB, FREAK(fast retina keypoint), and BRISK(binary robust invariant scalable keypoints). In experiments, videos are recorded and used for all testing template images, where camera translations, rotations, and scaling-related template images are included. The optimal parameters of the ORB, FREAK, and BRISK algorithms are selected by analysis and tests, and the registration error and running frame rates are tested before and after, respectively, integrating our feature matching algorithm with the optical flow algorithm. Experimental results show that our new algorithm is robust and has high registration accuracy with approximately one pixel and has a real-time 3D tracking rate of 20-30 frames per second. The algorithm can locate an image scale and region much better than before. The feature matching accuracy and speed between the current camera and template images increase obviously compared with several classic algorithms, especially when the resolution of the image is high. This algorithm can be used to track the natural image on a mobile platform.
关键词:augmented reality;three dimensional tracking;feature matching;mobile platform;template image;locating image patching scale and region
摘要:Crowd simulation has become an increasingly popular research topic because of its potential applications in virtual reality and computer animation. Most of the existing research utilizes methods, including social force, hydrodynamic, and data-driven models, to control the movement of the groups. Social force models treat agents as independent individuals with mass and velocity and control the movement of the agents by applying external controlling force. Hydrodynamic models introduce the concept of fluid dynamics into crowd simulation, which is appropriate for simulating large-scale groups. Data-driven models extract data from videos of real crowds and enter the data into crowd simulation to obtain authentic group behavior. These methods focus on the simulation of groups, which contain a large number of free-moving agents. However, these methods cannot be applied in simulating groups that move in a specific formation, which extensively exist in social activities (e.g., a marching army or a dancing group). In group formation control, the agents are expected to move in a similar direction with similar velocity among other agents in the group while maintaining an overall formation. The major difficulty of group formation control lies in the conflict between maintaining formation and collision avoidance. These problems can be solved traditionally by using a formation mesh to represent the group. However, in the previous methods, the agents are often strongly bound to the target location in the formation, thereby leading to the stiff behavior and inefficient movement of agents. A modified mesh-guide method is employed, and a deformable mesh is adopted to control the group motion, thereby achieving a group simulation with a certain formation. The formation is initially divided into a triangular mesh to connect all agents. The formation mesh is then transformed by the potential field of the obstacle during the process of group motion; thus, the agents can move with a certain formation without collision. Finally, an attraction point-based mesh-guide method is proposed to address the "dislocation phenomenon" in potential local agents, which may occur when the obstacles pass through the formation mesh. The implementation of the proposed algorithm can be mainly divided into three phases, namely, initialization, deformation, and recovery phases. In the initialization phase, the agents are grouped in a virtual environment and their locations are stored within the same group into different queues. The points in each queue are then reconstructed using Delaunay triangulation to form a triangle list, that is, the formation mesh. The virtual environment should also be considered in this phase. The environment is divided into uniform grids, and the potential field for each grid is computed according to the distribution of the obstacles. At the end of this phase, the target area of each group is set. This area has a strong attraction force to the group agents and has no obstacles. In the deformation phase, the vertexes of the formation mesh are driven by the influence of attraction force. The velocities of the vertexes that enter into a potential field depend on the resultant of the attraction and repulsive forces. The velocities of other vertexes in the formation mesh are calculated using the error of their current and desired positions. The desired position of each agent can be obtained by the deformation rules that minimizes the error metric. If the distance between two vertexes goes beyond the deformation range, then the meshes between the two vertexes will be removed from the current formation mesh to enhance the calculation efficiency of the mesh deformation stage. After the formation mesh is calculated, the agent will move along with the nearest vertex, which has not been occupied by other agents. Moreover, the potential fields should be updated in real time along with dynamic obstacles. In the last recovery phase, when the removed meshes recover within the deformation range, these meshes are added again into the current formation mesh based on the initial formation. Finally, the proposed method can recover the groups to the initial formation fast after passing through the obstacles. The formation motions of various multi-agent groups are simulated in different virtual scenes, such as marching army march, dynamic traffic flow, and group show scenes, using unity. A series of contrasting experiments is performed in various numbers of agent. Results showed that the time cost of the proposed algorithm is focused on the mesh deformation stage. When the number of the vertex in the formation reached 1 000, the average running time of the mesh deformation stage within one simulation step is 20.15 ms. The formation mesh generation stage is a previous process that will not affect the real-time performance of the algorithm. The proposed algorithm can improve the global efficiency of the agent movement by adapting the attraction point-based mesh-guide method, thereby transforming the formation transform easily and gracefully. The proposed approach can work well in the simulation of agent groups with formation control because the group collision with obstacles, either static or dynamic, can be effectively avoided while maintaining the stability of the formation. The experimental results strongly suggest the effectiveness of the proposed algorithm.
关键词:crowd simulation;deformable mesh;collision avoidance;attraction point;guiding mesh;obstacle potential field
摘要:Functional brain network(FBN) has emerged as an effective tool in examining the functional abnormalities of the brain network in patients with brain disease. FBN is a mathematical representation of brain, in which the brain region is the node, and a functional connectivity between each pair of brain regions is an edge. The functional connectivity between the brain regions can reveal disease-related abnormalities in brain physiology. The FBN can be measured by several neuroimaging techniques. Functional magnetic resonance imaging(fMRI) is one of the most commonly used neuroimaging techniques. fMRI can detect the functional activities of the brain based on blood oxygen level dependent(BOLD) signals. Moreover, the resting-state fMRI can measure spontaneous fluctuations in BOLD signals, which is useful in exploring the abnormal brain activities in patients with brain disease. Conventional FBN studies of the resting-state fMRI assume the temporal stationarity of FBN across the duration of the scan. However, these static FBN studies ignore the existence of slightly different mental activities during the entire scan session. In addition, recent studies suggest that the FBN exhibit dynamic changes, which may contain powerful information. This paper presents a multi-task fused least absolute shrinkage and selection operation(Lasso) method to construct the dynamic FBN of a resting-state fMRI. The proposed multi-task fused Lasso can preserve the sparsity and temporal smoothness of the dynamic FBN. Specifically, we impose a sparsity constraint to the functional connectivity between the brain regions, which is based on some neurophysiological findings that a brain region only directly interacts with a few other brain regions in neurological processes. In addition, the adjacent fMRI sub-series are required to be similar, which is based on the temporal smoothness of the dynamic FBN. We first use the sliding window approach to generate a sequence of overlapping resting-state fMRI sub-series. Second, the proposed multi-task fused Lasso is used to construct the dynamic FBN. K-means clustering is applied to obtain cluster centroids of these FBNs from the same class. All the cluster centroids are grouped together to form a regression matrix. Finally, the FBNs of the samples are regressed against the regression matrix to obtain the regression coefficients, which serve as features for classification. The classification can further verify the effectiveness of our method for constructing the dynamic FBN. The overall framework can be used for brain disease classification based on fMRI data, in which the features are extracted from the constructed dynamic FBN. We use a public fMRI dataset to verify the classification performance of the dynamic FBN constructed by the multi-task fused Lasso. Three groups of patients with Alzheimer's disease(AD), patients with early mild cognitive impairment(eMCI) and healthy controls(HCs) from the Alzheimer's disease neuroimaging initiative(ADNI) fMRI dataset are used for the experiment. Accuracy, sensitivity, and specificity are used to assess the classification performance. For the classification of the AD patients and HCs, our method achieves 92.31% accuracy, 96.15% sensitivity, and 88.46% specificity. For the classification of the eMCI patients and HCs, our method achieves 80.00% accuracy, 83.33% sensitivity, and 76.92% specificity. For the classification of the AD and eMCI patients, our method achieves 84.00% accuracy, 84.62% sensitivity, and 83.33% specificity. Experiment results demonstrate the improved performance of our method compared with the static and the traditional dynamic FBN models. The improved classification performance of our method indicates that the features extracted by the multi-task fused Lasso have advantages over the static or the traditional dynamic FBN models for classification purposes. This study presents a method for constructing a dynamic FBN of resting-state fMRI. The overall framework can be used for brain disease classification based on the constructed dynamic FBN. The proposed method can preserve the sparsity and temporal smoothness of the dynamic FBN and improve the classification performance simultaneously. The proposed method may contribute to the diagnosis of brain diseases to some extent. The proposed method can lead to an improved understanding of the dynamic FBN and brain diseases. The multi-task fused Lasso can be used to construct the dynamic FBN, which can explore the useful dynamic information of functional connectivity. In addition, this method can be used for the classification of brain diseases based on fMRI data.
摘要:Image segmentation is a crucial step in image processing. The method based on fuzzy cluster is one of the most effective methods for image segmentation by which most images can be accurately segmented using algorithms. Many inevitable geometric noises are found in a remote sensing image with the improvement of resolution ratio. Generally, the geometric noises need to be ignored in the image segmentation because they usually belong to either the main cluster or those that cannot be considered as a cluster. The traditional FCM algorithm and its improved algorithms use the membership degree as the common segmentation criterion. If there are geometric noises in the image, the clustering centers of traditional algorithm are very susceptible to the effects of the geometrical noise which easily causes them to be located in the wrong position. As a result, the traditional algorithms are difficult to be correctly segmented in this kind of image. In order to segment the geometrical noises image well, a new clustering algorithm called the "remote sensing images Fuzzy segmentation algorithm using inclusion degree and membership degree" is being proposed in this paper. The inclusion degree is proposed as a new measure to describe the relationship between pixels and clusters. Normally, the clusters should possess inclusion degree for every pixel with different levels. The proposed algorithm uses the inclusion degree combined with the traditional membership degree to segment the remote sensing images by defining a new objective function. The proposed algorithm gets the optimum inclusion degree and membership degree through continuous iteration to minimize the objective function. The pixel is classified to the cluster with the maximum value of the product of membership degree and inclusion degree. Finally, the average gray value of pixels within a class is used to display the segmentation results. First, in order to prove the effect of the inclusion degree, some point sets are generated to simulate the cluster and the geometrical noises. The results show that FCM algorithm can segment the cluster well when there are no geometrical noises existing in the image and when the cluster centers are located at the right position, which is at the center of the two clusters' respective point sets. However, after adding the geometrical noises, the cluster centers are seriously affected and located to the wrong place, which is at the center of the noises. On the one hand, the proposed algorithm can resist the effects of the geometrical noise keeping the cluster center on their respective point sets. In addition, this paper uses the proposed algorithm to segment the simulated image and the real remote sensing images, which it is further compared with the traditional FCM algorithm and the FLICM algorithm. The proposed algorithm has higher accuracy than the FCM algorithm and the FLICM algorithm based on the results of the experiment. The results illustrate that the proposed algorithm can eliminate the effect of the geometric noises. On the other hand, the other two algorithms cannot overcome the effect of the geometric noises in the wrong segmentation results wherein the geometric noises were regarded as independent clusters. In order to quantitatively analyze the proposed algorithm, the producer, user and overall accuracies and Kappa coefficient are calculated from the confusion matrix, and they are compared with the proposed algorithm, FCM algorithm and FLICM algorithm. The results showed that the accuracy of the proposed algorithm is higher than the other two accuracies and the Kappa coefficient. This paper proposed a new measure to describe the relationship between the clusters and the pixels based on the traditional membership degree. The proposed fuzzy segmentation algorithm combines the respective advantages of membership degree and inclusion degree for remote sensing image. The advantages of proposed algorithm include fast running speed and an easily understandable theory. The experimental results show that the proposed algorithm is capable of resisting the effect of geometrical noises by considering the inclusion degree, and it can segment remote sensing images more accurately than other algorithms. Hence, the proposed algorithm is an appropriate algorithm for images with geometric noises. However, the proposed algorithm will be restricted because it does not quote the neighborhood information, which is necessary in the next steps of image processing.
摘要:Satellite imagery classification is a task that uses classification models to divide a set of satellite images into several classes. The satellite images discussed in this paper are collected from the Quickbird satellite imagery dataset. Satellite images are divided into six classes, namely, airplanes, dense residential areas, harbors, intersections, overpasses, and parking lots. Generally, the task of satellite imagery classification is difficult because of the complex targets and backgrounds in satellite images. Traditional methods, such as the artificial neural networks and support vector machines, usually use low-level and manually selected features. These features are insufficient and cannot represent the multi-level and intrinsic features of satellite images. Simultaneously, obtaining high accuracy is difficult through the classification methods, which use low-level features. Some deep learning methods use pre-trained convolutional neural networks to extract the high-level features of satellite images and some classifier to classify satellite images. These methods can improve their performance than the traditional methods. However, these methods ignore the inherent classification capability of convolutional neural networks because considerable labeled training data of satellite images are required to train a convolutional neural network, which could extract features and classify images simultaneously; however, training data are limited in practice. Other methods use a stack of shallow convolutional neural networks to classify satellite images. However, the stack of low-level features remains insufficient representative to substantially improve the classification accuracy of satellite image. In this paper, a new approach using deep convolutional neural networks is presented to improve the classification accuracy for satellite imagery. The classification accuracy of satellite images could be improved using the deep features extracted by convolutional neural networks. An end-to-end training and classification method is proposed. This method does not require additional classifiers and stack of shallow convolutional neural networks to improve the capability of feature extraction from satellite images. First, a new satellite imagery dataset, which contains six classes, is proposed to deal with the problem of lacking labeled training data. Second, three kinds of pre-trained deep convolutional neural network models and a directly trained shallow convolutional neural network model are used to perform the classification task for satellite images. The shallow model has low training weights and can be trained directly on the satellite image dataset to classify satellite images. The proposed three kinds of deep models should be pre-trained on an auxiliary dataset, because the amount of the training weights of the three deep models is too large to be directly trained on the proposed satellite images dataset. The three kinds of deep models are pre-trained on a large auxiliary dataset, which contains roughly 1 200 000 labeled training images of 1 000 classes. All of the images contain the common objects, which could be viewed everywhere in the daily life. The weights of the three deep architectures of convolutional neural networks can be trained adequately after pre-training on the large auxiliary dataset. The capability of the deep models to extract representative features and to classify images can be improved after pre-training, and the application objects of the models can be transferred from the daily common objects to the satellite image objects. The key point of such transformation is fine-tuning the pre-trained deep models on the proposed satellite image dataset. The architectures of the three deep models should be changed slightly and then they could be fine-tuned on the proposed dataset. After fine-tuning, the three deep convolutional neural network models could be used to classify satellite images directly without using other classifiers or stacked shallow models. The proposed convolutional neural network models are validated on two datasets. One of the testing satellite images datasets is the proposed dataset, and the other dataset is the famous UC Merced land use dataset. The four proposed models demonstrate high performance on the proposed dataset. The classification accuracies of the three deep models are higher than the accuracy of the shallow model. In particular, the deepest convolutional neural network model achieves the highest accuracy of 99.50% on the proposed dataset. The results of the UC Merced land use dataset of the three similar methods in the literature are compared with the results of the proposed models. Two of the three comparative methods use the features extracted from the pre-trained convolutional neural networks without fine-tuning and use an additional classifier to classify satellite images. The other method uses a multi-view convolutional neural network to perform the classification task. Experimental results indicate that the proposed deep models in this paper achieve the highest accuracy(96.44%) among all the models. This paper proposes a new satellite image dataset, which is representative of satellite images. Convolutional neural networks could be trained adequately through the proposed satellite image dataset. Shallow convolutional neural networks can possibly be directly trained on this dataset. Pre-trained convolutional neural networks obtain better classification accuracy on other satellite imagery dataset after fine-tuning on the proposed dataset. Furthermore, the proposed deep convolutional neural network models are effective in terms of deep feature extraction and satellite image classification. The proposed models obtain more competitive results than other reported methods in literature. The proposed deep models exhibit good generalization capability and could achieve high accuracy on the UC Merced land use dataset, in which images differ from those in the fine-tuning dataset in terms of scale and quality. The effective pre-training and fine-tuning with the depth of the proposed deep models contribute to the good performance of the models. In addition, the proposed models are end-to-end models. Additional classifiers and stack of the shallow models are not required to classify satellite images.
摘要:Phase unwrapping is one of the key steps in InSAR data processing, and its precision directly affects the accuracies of DEM generation or surface deformation monitoring. The phase extracted from a complex SAR interferogram is wrapped because it represents a measure; modulo 2π and multiples of 2π must be added or subtracted. Ideally, when the sampling rate of the image satisfies the Nyquist sampling theorem, phase unwrapping can be achieved by a simple 2D integration. However, phase inconsistencies caused by overlay and shadow appear in the actual data. When the integration path passes through the abovementioned area, the phase error will spread, leading to unwrapping errors. Numerous phase unwrapping algorithms are proposed to solve the above problems. The statistical-cost network-flow algorithm (SNAPHU) is a kind of network flow method that is widely used in InSAR data processing. Based on the SNAPHU algorithm, this study aims to find a phase unwrapping algorithm for airborne SAR interferometry. While applying the SNAPHU to high-resolution airborne InSAR data, if the interferogram has phase inconsistencies caused by lines of trees, then the unwrapped result will show large areas of phase jump along the inconsistencies correspondingly. SNAPHU, which has limitations in managing local phase inconsistencies, is a global optimization algorithm. Local phase inconsistencies can be optimized by a local algorithm. Phase inconsistencies can be described by residues. Local data can be optimized by residue compensation. This paper proposes an airborne unwrapping algorithm combining the residue degradation with the SNAPHU, which has advantages of the local and the global optimum. Residues are degraded to non-residues according to the values of residues and neighboring pixels, and the local data are optimized. The modified SNAPHU algorithm is applied to the degraded image. The proposed algorithm can be divided into three steps.(1) Residue degradation. First, the filtered interferogram is flattened. Residue degradation is applied to the flattened interferogram. This process contains residue detection and compensation. Phase inconsistencies lead to the presence of residues. Phase inconsistencies mean the absolute phase difference between two neighboring pixels exceeding π. Therefore, the pair of pixels in the residue, whose phase is inconsistent, is first located to degenerate the residue, and then a compensation constant C is set to compensate the residue. The residue can be degraded by multiple iterations because the compensation changes the value of phase and the phase difference between the neighboring pixels. A threshold N for the remaining residues is set to balance the accuracy and efficiency. When the number of residues is less than N, the compensation stops to control the degree of degradation.(2) Phase unwrapping of airborne SNAPHU algorithm. In the airborne system, the incidence angle calculated by the original geometric model of SNAPHU is close to zero, which is obviously incorrect, because the height of the platform is far less than the radius of the Earth. Simultaneously, the earth curvature can be ignored to simplify the model with a narrow mapping bandwidth in the airborne system, thus the parameters and the geometric model in the SNAPHU algorithm are modified according to the calibration parameters of the airborne InSAR system. The modified SNAPHU algorithm is employed for phase unwrapping.(3) Median filter. A median filter is applied to the unwrapped result, which effectively reduces phase noise, and the edge information is well preserved. Therefore, a 5×5 median filter is applied to the unwrapped image to obtain the final result. The efficiency and accuracy of the proposed method is tested and validated by using the single-pass dual antenna airborne InSAR data covering Jiangyou, Sichuan areas in 2011. Unwrapping results have a significant error region using SNAPHU directly. A large area of the phase jumping exists. However, error unwrapped regions are significantly shrunk after residue degradation, and phase jumping regions are effectively corrected. After a 5 × 5 median filter, the partial phase noise is removed, and the unwrapping result is smooth. The performances of the improved methods are evaluated by the number of discontinuous points in azimuth and range directions. When the number of discontinuous point is minimal, the anti-phase distortion performance is improved, and the unwrapping quality is high. The residue degradation effectively reduces the number of discontinuous points in the interferogram. The number of discontinuous points in either direction of improved airborne SNAPHU algorithm is lower than that of the simple airborne SNAPHU algorithm. The anti-phase distortion performance and the quality of the unwrapping results are obviously improved. The residue degradation process effectively solves the problem of large unwrapped phase jump and becomes further robust to noise. This algorithm combines the advantages of local and global optimization, which has a good overall performance, and effectively solves the problem of large unwrapped phase jump. Moreover, median filter can effectively reduce the noise and is robust to noise.
关键词:airborne SAR interferometry;phase unwrapping;residue;statistical-cost network-flow(SNAPHU)