最新刊期

    22 8 2017
    • Recent advances in correlation filter-based object tracking:a review

      Zhang Wei, Kang Baosheng
      Vol. 22, Issue 8, Pages: 1017-1033(2017) DOI: 10.11834/jig.170092
      摘要:Visual object tracking is a key problem in computer vision area.It has a wide range of applications in the field of human-computer interaction, behavior recognition, robotics, and surveillance.Recently, visual object tracking has been widely applied to object tracking field due to its efficiency and robustness of correlation filter theory.A series of new advances have been introduced and much attention has been achieved.Correlation filter-based tracking methods have become a research interest in this field.These methods have received extremely compelling performance in recent benchmarks and competitions.This paper reviews the current research status of the tracking field to allow more researchers to explore the theory and development of correlation filter-based trackers (CFTs). First, the general framework of correlation filter tracking is introduced.The correlation filter theory is presented based on the general framework.Then, the classic CFTs, such as the kernelized correlation tracking, are described in three parts as follows:training, detection, and updating in detail.Furthermore, problems that often occurred in tracking task, such as feature representation and scale variation, are discussed.The scale estimation strategy in CFTs is further divided into four different categories depending on how they handle the scale variation;the categories are scaling pool-based strategy, part or patch-based strategy, keypoint-based strategy, and other methods based on their proposed models.In addition, the current research status is analyzed in three aspects, namely, model-based improvement, part-based tracking, and ensemble-based tracking.Future development trends are also presented in the discussion. Fifty video sequences from object tracking benchmark (OTB-2013) dataset have been adopted in the experiments to analyze the performance of 45 state-of-the-art trackers, including 14 representative CFTs.Comparisons are performed for these trackers using average center location error, average Pascal VOC overlap ratio, and median frame per second.Precision and success plots are presented simultaneously to evaluate the overall and attribute-based performances.In the overall performance evaluation, the top 15 trackers are provided, and 11 CFTs are observed, fully reflecting the superiority of these trackers.In the attribute-based evaluation, benchmark sequences are annotated with 11 different attributes, such as scale variation, occlusion, and deformation.Success plots of different attributes are presented, and the performance of CFTs and state-of-the-art trackers is discussed.Experiment results demonstrate that most CFTs not only perform favorably in comparison to the state-of-the art trackers in terms of accuracy and robustness but also satisfy the demand of real-time processing. The research on correlation filter theory achieves great improvements;it has extensive applications in the object tracking field.However, actual scenarios are usually complex, and targets often undergo large appearance changes, which easily influence the tracking performance.Object tracking remains a challenging task.Developing highly efficient and robust CFTs is considerably significant.Future studies can be conducted on balancing between the accuracy and efficiency, selecting appropriate features, and exploiting spatial structure of reliable parts.The application of CFTs in multi-object tracking and long-term tracking is also a valuable point.  
      关键词:correlation filter tracking;feature representation;scale variation;part based model   
      5656
      |
      572
      |
      47
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112298 false
      更新时间:2024-05-08
    • Strong edge-oriented blind deblurring algorithm

      Chen Huahua, Bao Zongpao
      Vol. 22, Issue 8, Pages: 1034-1044(2017) DOI: 10.11834/jig.170020
      摘要:In image acquisition, the relative displacement between the imaging device and shooting scene causes loss of information and blurring degradation, which significantly affects image quality and visual experience and results in various complicated processing on the images.Non-blind deconvolution obtains a sharp latent image from a blurred image with a known blur kernel.By contrast, blind deconvolution, which aims to estimate the unknown blur from the observed blurred image and recover a sharp original image, is challenging.Thus, blind deconvolution has been an active research area in image processing communities over the last four decades.Given the problem of image deblurring, most approaches introduce an image prior that favors natural images over degraded ones.This approach can achieve high-quality results.Thus, a blind deblurring approach based on strong edges is proposed in this study. The sparsity of image gradient is combined with the strong edges of the latent image gradient and then regularized by an adaptive l-norm.The adaptive l-norm is a weighted metric that measures the usefulness of gradients, the large metric corresponding to pixels in flat regions or rich-texture regions, and the small metric corresponding to pixels in strong-edged regions.For the sparsity and the continuity of the blur kernel, the pixels and the gradient of the blur kernel are regularized using the l and l norms, respectively.Meanwhile, the blur kernel is normalized in advanced and introduced into the optimization model as a regularized term, where strong edges are used to direct the blur kernel estimate.The prior sparse gradient image and the compound priors for sparsity of the gradient of the blur kernel, the continuity and the normalization of the blur kernel are considered;thus the proposed model favors sharp images over the degraded images.Obtaining an accurate solution using the proposed model is slightly complicated.Thus, an alternating iteration approach is performed to solve the model by updating the process iteration through two easy sub-problems, namely, latent image estimate and blur kernel estimate.An augmented Lagrangian method (ALM) is used to identify the latent image, and a quadratic function method is used to determine the blur kernel.For the latent image solution, the sub-problem is equivalently formulated as an unconstrained optimization problem by introducing an auxiliary variable in the sub-model, which is performed using ALM.The solution is obtained by an alternating optimization strategy, such that is updated by iteratively hard thresholding method, and latent image is updated by iteratively fast Fourier transform in the frequency domain.In the blur kernel process, the sub-problem is equivalently described as an unconstrained problem, with auxiliary variable .To obtain the solution, the sub-problem is decomposed into two easy sub-problems with regard to and the blur kernel and iterating each alternately, such that is updated by the iteratively hard thresholding method, and the blur kernel is updated by the iteratively quadratic function optimization.Image deconvolution using a prior hyper-Laplacian can obtain a clear image with main structures and few artifacts but sometimes it fails to preserve some fine details.Moreover, total variation norm can preserve abundant small textures but retains noise and ringing artifacts.For the restored image estimation, the estimated blur kernel and the two algorithms are combined to utilize their merits, reduce their limitations, and build the corresponding optimization with respect to the intermediate with rich saliency edges which are the blurred observation enhanced by a shock filter, and the sharp image is then obtained by averaging the results recovered from the prior hyper-Laplacian-based method and the total variation norm-based method.Thus, the ringing effect in the restored image is reduced while preserving more image details. To test the effectiveness of the proposed algorithm, the Levin set and the actual blurred images are tested and compared with state-of-the-art algorithms.The ratio of deconvolution error (RDE) and peak signal-to-noise ratio (PSNR) are used to evaluate the results in the Levin set.Experiments on the Levin data set show that the proposed method achieves a successful rate of blind deconvolution (100%) even with the smallest RDE of less than 2.6.This result is higher than that of the second-best method (0.3) and much higher than that of the worst method (2.4).The largest PSNR is 30.59 dB, which is greater than those of the second-best approach (1.01 dB) and the worst approach (19.81 dB).The extensive details show that the blur kernel obtained from the proposed method has more accurate support, less noise, and achieves sharp images with better visual effects.Experiments on actual color images demonstrate the proposed method can obtain more accurate blur kernel and better image quality compared with the state-of-the-art algorithms.The proposed method provides a dominant recovery but is also time consuming compared with the state-of-the-art algorithms. The proposed method outperforms the other algorithms and appears to be outstanding in latent image and blur kernel estimate and in image quantitative and qualitative motion deblurring.This method can be used in remote sensing, medicine, and other fields.Comparison of time consumption shows that better overall performance of the proposed method can be obtained by improving the algorithm optimization, as well as performing parallel implementation in the future.  
      关键词:blind deblurring;strong edges;blur kernel;ringing effect;sparsity   
      4569
      |
      377
      |
      5
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111687 false
      更新时间:2024-05-08
    • Curve description and matching using arch sequence

      Wei Hui, Li Lirong
      Vol. 22, Issue 8, Pages: 1045-1055(2017) DOI: 10.11834/jig.170041
      摘要:Curve matching is a significant problem in computer vision and image processing.It has a wide range of applications in cultural debris splicing, medical image registration, and product testing.Given the importance of curve description and matching, scholars have conducted numerous research and have made significant progress in this field.However, several problems still exist due to the relatively high practical application requirements.At present, many common curve-matching methods can obtain acceptable classification results.However, for the geometric shape similarity problem, the results obtained by these methods cannot accurately reflect the similarity of geometric shape and are inconsistent with human cognition.An acceptable shape descriptor can effectively differentiate the target shape.Moreover, the descriptor of the target shape should remain unchanged even with the translating, rotating, and scaling of the target shape.To achieve the purpose of valid curve description, this study presents a curve description method based on arch sequence.Curve matching is usually based on the curve description method, using a certain metric to determine the degree of similarity between curves.To achieve the curve matching accurately and quickly, an arch sequence matching method based on dynamic programming is proposed according to the curve description method of the arch sequence.The similarity degree of the two curves is determined according to the similarity degree of the two arch sequences. For curve matching, the characteristic descriptor of the curve is initially defined and this descriptor is used to describe the curve.The appropriate method is used to match the curves based on the curve description.The corners of the curve are initially extracted to realize the curve description based on the arch sequence, and the curve is divided into a series of sub-curves by corners.All adjacent two sub-curves can be combined to form an arch.A contour curve can be expressed as a sequence of successive overlapping arches using an arch composed of sub-curves to represent the curve.For each arch in the arch sequence, the ratios of bow height to chord length, of bow height half-chord length to chord length, and of arc length to chord length, and the sine of the chord angle and the connection of bow high point to the midpoint of the chord are used to describe the arch.Calculating the degree of similarity between the corresponding arch sequences is necessary.Defining the distance between one arch and the other is necessary in calculating the degree of similarity between arch sequences.The ratio of bow height to chord length and other related eigenvalues are used to calculate the distance between the arches.The idea of edit distance of the string in the dynamic programming method is adopted to calculate the minimum cost of converting an arch sequence into another arch sequence, thereby obtaining the similarity between arch sequences quickly and accurately.The distance between the curves can be obtained using the edit distance between the arch sequences. The curve description and matching method based on the arch sequence are used to stitch the contours and compute the similarity of the geometric figures, thereby verifying the effectiveness of the proposed method.To verify whether this method can be applied to contour splicing, the method is initially applied to the splicing of two fragments and then to the splicing of two map contours.In the splicing experiment of fragments and map contours, the fragments and map contours are concisely stitched together using the curve description and matching method based on the arch sequence.A similarity measure is performed in a geometric test library to verify that the method can determine whether the geometries belong to the same type.In the cross-measurement experiment of geometric similarity, the curve description and the matching method based on arch sequence can accurately reflect the similarity of the graph.The similarity of the graph can accurately judge whether the two images belong to the same type.This method is used to calculate the similarity degree of the six groups of geometric shapes, thereby verifying that the method can reflect the similarity of geometric pairs.This method can provide the same result as human judgment in the comparative experiment between geometric shape and similarity.The method has better distance value to reflect the similarity of the image, compared with the method of chain code feature, multiscale invariant, shape context, and geometry complex transform(GCT). This study presents a curve description and matching method based on arch sequence.In this method, the contour curve is expressed as an arch sequence.The dynamic programming method is used to realize the curve matching based on the arch sequence.The algorithm can effectively describe, match the curve, and provide low time complexity.In this study, the curve description and matching method based on arch sequence is applied to the simulation experiment of stitching contours, the cross-measurement experiment of geometric similarity, and the comparison experiment of geometric similarity degree.The method can accurately stitch the contours, judge the similarity of the geometric figures, and provide results that are consistent with the human visual judgments.For the geometric shapes with different degrees of similarity, the distance between the arch sequences can also effectively reflect the difference.  
      关键词:curve description;curve matching;shape similarity;arches sequence;dynamic programming   
      3183
      |
      365
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112371 false
      更新时间:2024-05-08
    • Non-local TV-L

      Zhang Congxuan, Chen Zhen, Wang Mingrun, Li Ming
      Vol. 22, Issue 8, Pages: 1056-1067(2017) DOI: 10.11834/jig.160594
      摘要:Optical flow estimation is a significant research in the areas of computer vision, image processing, and pattern recognition.After the original works of Horn and Schunck, as well as those of Lucas and Kanade, the accuracy of flow-field computation has significantly increased via numerous remarkable contributions in the last three decades.However, the robustness of the optical flow estimation is still a challenging task at this stage of development.A non-local total variation with L norm (TV-L) optical flow computational model based on the weighted neighboring triangle filtering has been proposed in this study for the accuracy and robustness of the optical flow estimation under difficult scenes, such as non-rigid motion, motion occlusion and discontinuity, large displacement, and complex edges. First, the nonquadratic penalty function based on the L norm and the combination of the brightness constancy assumption and gradient constancy assumption are employed to constitute the robust data term.Thus, the negative influences of brightness changes, image noise, and motion occlusion and discontinuity can be reduced.Secondly, the nonquadratic penalty function based on the L norm and the image gradient-based self-adaptive weight are introduced to produce the image-and flow-driven smoothing term, thereby addressing the problems of boundary blur and edge over-segmentation caused by non-rigid motion and complex edges.Third, the optical flow computation energy function based on the TV-L model is presented with the proposed robust data term and the smoothing term incorporated with the image and flow information.Finally, the non-local TV-L model for optical flow estimation is proposed by adding the weighted neighboring triangle filtering based on non-local term to the presented classical TV-L energy function to remove the outliers in the estimated flow field caused by the large displacement.The non-local term is replaced using a weighted neighboring triangle-based median filtering to acquire the linearized numerical computational scheme corresponding to the non-local TV-L energy function.The median filtering optimizes the flow field at each layer of the image pyramid through the coarse-to-fine warping strategy. The test sequences of the MPI Sintel and Middlebury databases are employed to evaluate the accuracy and robustness of the proposed method and the other state-of-the-art methods, including large displacement optical flow (LDOF), total variation regularization of local-global optical flow (CLG-TV), sparse occlusion detection with optical flow (SOF), and classic model with non-local constraint (Classic+NL) to illustrate the performance of the proposed method when dealing with non-rigid motion, motion occlusion and discontinuity, large displacement, complex edges, and other challenges.Experimental results show that in comparison with the other state-of-the-art methods, the error statistics indexes of average angle error (AAE) and average endpoint error (AEE) of the proposed method decreased by 47.76% and 89.04% for the MPI test sequences, and decreased by 28.45% and 28.42% for the Middlebury test sequences, respectively.Furthermore, the time consumption of the proposed method increased by 5.16% for the Middlebury test sequences compared to the classical median filtering based method;the added running time of the proposed method may be caused by the image triangulating. The comparison results among the proposed method and other state-of-the-art optical flow computation methods using the MPI Sintel and Middlebury test sequences showed that the proposed method can be better applied to difficult scenes, such as non-rigid motion, motion occlusion and discontinuity, large displacement, and complex edges.This result indicates that the proposed method has higher accuracy and better robustness than the other state-of-the-art methods, especially for the challenges of difficult scenes.  
      关键词:optical flow;weighted neighboring triangle;median filtering;non-local constraint;difficult scene   
      2631
      |
      346
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56113397 false
      更新时间:2024-05-08
    • Invariant feature extraction and recognition for shapes

      Xu Haoran, Yang Jianyu, Huang Weiguo, Shang Li
      Vol. 22, Issue 8, Pages: 1068-1078(2017) DOI: 10.11834/jig.170080
      摘要:The shape of object contour is an important indication for image retrieval and object recognition;it is usually represented by a binary image.Although the binary images of objects have few features, such as color or texture, human can still recognize them only by shapes.By contrast, the shapes of objects cannot be recognized by computer directly.In recent years, shape retrieval and recognition have been fundamental topics in computer vision and have been widely studied for various applications, such as character recognition, biomedical image analysis, hand gesture recognition, robot navigation, and human gait recognition.To extract salient features for the representative characterization of a shape, many shape descriptors have been proposed and have reported promising results.However, the influences of viewpoint variations and nonlinear deformations, such as significant intra-class differences, geometric transformations, and partial occlusions, are challenging problems that decrease the accuracy of shape matching and recognition.Most traditional shape descriptors utilize local or global information of shapes, which cannot solve the problems on shape deformations and intra-class variations simultaneously.The local descriptors can represent the local shape features effectively but do not consider the global shape structure.By contrast, the global descriptors are robust to local noise and deformations but ignore the detailed local shape features and cannot deal with occlusion.A novel invariant multi-scale descriptor with different types of invariant features is proposed to capture the local and semi-global features of shapes. The invariant multi-scale descriptor is defined with five types of invariants, which capture shape features in five forms, including area, changing rate of area, arc length, changing rate of arc length, and central distance.These five types of invariants are normalized between 0 and 1 to capture the inconsistent variations adaptively within one shape and avoid scale transformation.The proposed multi-scale descriptor calculates invariants in multiple scales to combine the advantages of local and global descriptors.This method uses small scales to capture shape details and large scales to represent semi-global features, thereby obtaining rich characterizations of shapes.Considering that different numbers of sample points are usually in two contours for shape matching, dynamic time warping (DTW) algorithm is employed to determine the best correspondence between two sequences of contour points and offer the similarity measure of two different shapes based on their invariant multi-scale descriptors. The invariance and robustness of the proposed invariant multi-scale descriptor is evaluated through multiple comparative experiments.In the particular experiments, the five types of invariants of shapes with different influences are plotted, and their Euclidean distances are calculated to show the similarity between different shapes from the same class.The experimental results validate that the proposed descriptor is robust to rotation, scale transformation, partial occlusion, intra-class variations, articulated variations, and noise.Moreover, the effectiveness in shape matching of the proposed method is evaluated in the experiments of shape retrieval on several benchmark datasets.The bull's eye score is used as the rule of judgment in the experiments.In comparison with other methods, the proposed method has the highest accuracy in all four shape datasets, that is, 91.79% in MPEG-7 dataset, 89.75% in the articulated dataset, 95.27% in Kimia's 99 dataset, and 91.33% in Kimia's 216 dataset.At the same time, the average time consumed by the shape recognition in MPEG-7 dataset with the proposed method is 65 ms, which is better than the other recognition methods.The state-of-the-art results demonstrate that the proposed method is effective for shape recognition and retrieval tasks. A novel invariant multi-scale descriptor is proposed for shape representation, matching, and recognition.In the proposed descriptor, five types of invariants are utilized to capture shape features from different aspects.These invariants are calculated in several scales, assuring that the local and global information of shapes can be represented simultaneously.The DTW algorithm is employed to determine the best correspondence between two sequences of contour points based on their invariant multiscale descriptors, thereby identifying the appropriate similarity measure for different shapes.The experimental results validate that the proposed descriptor is invariant to rotation, scaling, partial occlusion, intra-class variations, and articulated deformations.The plots of different invariant functions show that the local and semi-global features are both captured by the invariants in different scales.The proposed DTW algorithm can appropriately measure the similarities among different shapes, regardless of the number of their contour points.The retrieval experiments on the benchmark datasets verify that the proposed method has a comparable advantage on retrieval accuracy and efficiency, which are better than the other popular shape recognition methods.The proposed method in this study is suitable for shape recognition and retrieval tasks in complex environments.This method cannot use the prior knowledge of large datasets to accelerate the computation speed and improve the accuracy of shape retrieval and recognition in shape datasets.Therefore, for future studies, the metric learning method would be introduced into shape matching for the better performance of the proposed method.  
      关键词:feature extraction;invariant;shape description;shape matching;object recognition;pattern recognition   
      3535
      |
      386
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111230 false
      更新时间:2024-05-08
    • Guo Lihua, Luo Cai
      Vol. 22, Issue 8, Pages: 1079-1088(2017) DOI: 10.11834/jig.160603
      摘要:The intelligent analysis of a user's diet via computer is a meaningful research topic.Traditional methods have focused on food recognition.However, China is a country with diverse food styles.Therefore, designing a generic food recognition method is difficult.This study focuses on food material recognition instead of food recognition.Thus, a FOOD-SCUT image data set is initially collected.This data set includes 70 types of food material that are commonly seen in people's daily lives.The total number of images is 8015. This study extracts different traditional features and uses different classification methods in implementing food material recognition to compare the experimental performance.The selected features include scale-invariant feature transform, color histogram features, histograms of oriented gradients, speeded-up robust features (SURF), local binary patterns, and Gabor features.All features are further encoded using the bag-of-words method, and the selected classifiers include the support vector machine, native Bayesian classifier, random forest, and K-nearest neighbor classifier.Moreover, this study designs a deep neural network for food material recognition. Various experiments have been performed to test the performance of the proposed food material recognition using different combinations of traditional features.By adjusting different features combination and selecting an appropriate classifier and its optimum parameters, the best recognition accuracy can be achieved using the proposed method based on the traditional features.Experiments further test the performance of the proposed method using a deep convolution neural network.The deep convolution neural network includes two different training modes, namely, direct training and pretraining modes.The best performance can be achieved after selecting different numbers of layer and adjusting the weight initialization method.The final experimental results show that the proposed method can achieve 88.98% recognition accuracy based on the traditional features.However, using the deep convolution neural network can achieve a recognition accuracy of 95.7%, which is 6.72% higher than the proposed method based on the traditional features. The statistical analysis of the FOOD-SCUT data set in this study shows the diversity within classes and the similarity between classes.Therefore, this FOOD-SCUT data set can be used for further food material analysis.Moreover, the high application of the FOOD-SCUT data set can provide all original images for the further implementation of food material application.Moreover, the experimental results can provide suggestions concerning the selection of models and parameters to reduce the developing time in future studies.  
      关键词:food material classification;fined-grain image classification;image recognition;deep learning;convolutional neural network;image classification   
      2731
      |
      301
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110946 false
      更新时间:2024-05-08
    • Adaptive K-means clustering simplification of scattered point cloud

      Chen Long, Cai Yong, Zhang Jiansheng
      Vol. 22, Issue 8, Pages: 1089-1097(2017) DOI: 10.11834/jig.160582
      摘要:With the rapid development of 3D scanning technology in the field of reverse engineering, point clouds that are obtained by 3D scanning devices are commonly dense and disordered.These characteristics of point clouds result in a large number of redundancy data.Moreover, subsequent point processing work, such as smoothing, meshing, and surface reconstruction becomes difficult and inefficient.Therefore, point cloud simplification is a considerably significant prerequisite for smoothing, meshing, surface reconstruction, and other follow-up point cloud processing works.In recent years, point cloud simplification algorithms based on feature preserving method can obtain a better simplified effect than algorithms based on nonselective reduction method presented, which have been widely utilized in several studies.However, the existing algorithms for point cloud simplification still have some inevitable shortcomings, including a large lack of fidelity to the origin, hole generation, and inadaptability to flaky point clouds.This study presents a simplification algorithm of scattered point cloud based on an adaptive K-means clustering, aiming to solve the problems in the existing aforementioned simplified algorithms. The curvature, average vector angle between points, the K-nearest neighborhood points, the distance from the point to its K-nearest neighborhood gravity center, and the average distance from the point to its K-nearest neighborhood points for each data point are calculated.The feature discriminant parameter and feature discriminant threshold are defined based on the four parameters regarded as the most important by the proposed algorithm.A point is recognized as a feature point when its value of discriminant parameter is greater than the threshold value.Thus, the feature extraction based on multiple parameters hybridization method is adopted to identify and preserve these feature points, including surface sharp and boundary points.Then, an adaptive octree is established for the point cloud to allow the K-means to initialize cluster centers and value, which are related to the density distribution of the point cloud.Finally, if the clustering results contain the feature points, then the feature points in the clustering are removed and the cluster centers are updated.In general, cluster members in flat areas are sufficiently similar to each other in the spatial and feature domain.Thus, the cluster center can be employed to represent the cluster.However, in high curvature areas, the cluster members may not be similar to each other in the spatial and feature field because of highly detailed features.Therefore, the cluster will be subdivided to preserve the detail features when the maximum curvature difference between data points in the cluster is greater than the threshold value.The clustering subdivision will continue until the maximum curvature difference is smaller than the threshold value or when only one data point in the cluster is observed.The nearest point to the cluster center is preserved to represent the cluster after the clustering subdivision. In view of clustering, the bunny point cloud is considered as an example for the comparison of traditional K-means clustering algorithm and adaptive K-means clustering algorithm.The comparison result shows that the adaptive K-means clustering algorithm can obtain better data in terms of number of iterations, evaluation function value, and runtime compared with K-means clustering algorithm.In the aspect of simplification, the proposed reduction algorithm is applied to two types of point clouds (i.e., closed-boundary and flaky point clouds).The experimental results show that when the simplification ratio has the value of 1/5, the reduction errors of the fandisk and saddle models are 0.29×10, -0.41×10, and 0.037, -0.094.Moreover, the boundary shrinkage error of the saddle model is 0.030 805.The aforementioned error values are less than the error values of the grid simplification method and the curvature simplification method. The proposed scattered point cloud simplification algorithm can be used for closed-boundary and flaky point clouds.Moreover, the simplified point cloud is well-distributed in space and can avoid holes.The boundary points of the model can also be protected when the algorithm is applied on the flaky point cloud model.  
      关键词:point cloud simplification;octree;K-means clustering;flake point cloud;boundary points   
      6701
      |
      425
      |
      9
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111120 false
      更新时间:2024-05-08
    • Image copy detection method based on contextual descriptor

      Yang Xinglong, Yao Jinliang, Wang Xiaohua, Fang Xiaofei
      Vol. 22, Issue 8, Pages: 1098-1105(2017) DOI: 10.11834/jig.160562
      摘要:With the rapid growth of Web images, image retrieval is becoming an important application.However, Web images are easily downloaded, edited, and re-uploaded.Therefore, a great number of image copies on the Web can be found.Finding and filtering image copies can improve the effectiveness of image search engines.Two images are considered as copy in this study based on whether these two images are generated from the same original image by certain image editing operations.Editing operations include cropping, scaling, rotating, changing the compression rate, adding other contents, framing, and other non-affine geometric transformations.Over the last few years, several methods have been proposed to detect image copy in large-scale image dataset.In the early stage of image copy detection research, an entire image is represented as a feature vector or descriptor.The similarity of feature vector or descriptor is measured to verify whether two images are copies.These methods are efficient and do not require high storage cost.However, these methods are not robust to certain common image editing operations, such as chopping and adding of objects.Local feature is more robust in image copy detection compared with global feature.However, local feature, such as SIFT, is a high-dimension feature vector and result in high time cost in local feature matching, especially in large-scale image dataset.The bag-of-words (BoW) model was applied to image copy detection field and used by state-of-the-art methods on image copy detection to solve the aforementioned problem.In these methods, an image is represented as a bag of local features, which are then quantized into visual words.Inverted file indexing is applied to register images via these visual words and improves retrieval efficiency.However, visual words have significantly lesser discriminative power than text words because of quantization loss.The loss of quantification on local feature causes a large number of mismatching local features, which affect the precision of image copy detection.Some methods have been proposed to eliminate the influence of visual word mismatches and improve image copy detection performance.The geometric verification for rejecting visual word mismatches has become popular as a visual words post-verification step.However, geometric verification methods initially need to obtain the matched pairs of visual words between query and candidate images.Then, the spatial similarity of the matched visual words between the two images is calculated to reject mismatches of visual words.The process of rejecting mismatches is usually applied to only some top-ranked candidate images because of due to expensive computational cost and large number of candidate images in large-scale datasets.In addressing the problems of post-verification processes of visual words, one basic idea that has been explored is designing a contextual descriptor that can be used to filter the mismatches of visual words immediately according to the similarity of descriptors.An image copy detection method based on contextual descriptor is proposed in this paper.The contextual descriptor consists of the information regarding the neighboring local features and improves the discriminative power of visual words. In the proposed method, the information about the contextual descriptor consists of neighboring visual words and the spatial relations of local features.The neighboring visual words represent the neighboring local features, whereas the spatial relations are represented as angles in the contextual descriptor.If the matching visual words have similar neighboring local feature, then the pair of visual word is considered as a true match.The process of the proposed method is as follows.The Euclidean distance and scale of the local feature are used as the context of a local feature in an image to select the neighbors.The information about neighbors, such as position, dominant orientation, and visual words are used to construct the contextual descriptor.Subsequently, each candidate match of the local feature is verified according to the similarity measure of the contextual descriptor whether it is a true match.In this measure, if the matching visual words have the same neighbors and similar spatial relations, then the matching pair of visual words is considered as a true match.Finally, the similarity between images is measured by the number of true matches of visual words.In this study, neighboring visual words and spatial relations are used to verify matching visual words.The verification measure is significantly strict.Most mismatched visual words are eliminated. Some experiments are performed on the Copydays database and compared the proposed method with the baseline method.Experiments show that the mean average precision (mAP) of the proposed method increased by 63% compared with the baseline method, whereas the distracter images are 100 k.Although the distracter images increased from 100 k to 1 M, the experimental results of the baseline still decreased by 9% and the proposed method only decreased by 3%.In the proposed method, neighboring visual words are indexed into the contextual descriptor before image copy detection.Thus, the proposed method belongs to the pre-verification method and has less detection time compared with the post-verification method, which is confirmed by the experimental results. In this study, an image copy detection method that is robust to most image editing operations, such as rotation, image overlay, scale, and cropping, is proposed.The proposed method obtains high mAP in public test dataset and is efficient in real application scene, such as image copyright prevention and image duplication removal.  
      关键词:local feature;visual word;image copy detection;bag-of-words;image detection   
      2299
      |
      310
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111576 false
      更新时间:2024-05-08
    • Luo Huilan, Lai Zeyun, Kong Fansheng
      Vol. 22, Issue 8, Pages: 1106-1119(2017) DOI: 10.11834/jig.170032
      摘要:A video action recognition algorithm based on action segmentation and manifold metric learning is proposed to improve the accuracy of action recognition in videos. First, a video action segmentation algorithm based on analyzing the spreading area of actors' limbs is proposed to divide the video into segments that contain a specific action.The segmentation operation is used to recognize an action in the video quickly and reduce the mutual interference between adjacent actions.A silhouette of the actor in a frame is extracted using background subtraction method.Bounding boxes are generated in terms of the silhouettes.Given that silhouette extraction is affected by the background, the area function of the bounding boxes contains some noise, which can damage the regularity of the area function.After calculating the area value of the bounding box for each frame, the area function is smoothed using a robust weighted smooth method.Then, after extracting all the local minimum points of the smoothed area function, the second filter is used to remove fake local optimal points.After two filtering operations, the remaining minimum points are used as the segmentation position in the videos.Subsequently, the action recognition algorithm is independently implemented on each segment.For feature extraction and description of each segment, the Lucas-Kanade optical flow field is initially computed to obtain the velocity information of pixels for each frame in the segment.The pixels with non-zero magnitude of optical flow are considered as the interest points.Intraframe local curl and divergence, which is derived from the Lucas-Kanade optical flow field, are used to describe the motion relationship between interest points in the frame.A covariance matrix is formed for each action segment to fuse the features, including normalized global temporal features, normalized spatial features, optical flow, intraframe local curl, and divergence.The size of the final covariance is 7×7.Thus, the dimension of the feature covariance is relatively low.In this feature space, the action segment videos form a manifold.Several methods that measure the distance in the manifold space have been proposed.Generally, the distance between two points in a manifold space is the geodesic distance between them.In this study, a distance measurement method, which is obtained by supervised manifold metric learning, is proposed to further improve the accuracy of action classification.The LogDet divergence is utilized, and the action class labels are used to construct a constraint.A tangent space transfer matrix is obtained using the manifold metric learning.The tangent space transfer matrix leads distance calculation into a tangent space of a new latent manifold.Finally, the nearest neighbor classification method is used to recognize the actions. The three parts of the experiment are as follows.First, the efficiency of the action segmentation algorithm is evaluated on the Weizmann public video dataset.The results show that the proposed action segmentation method has acceptable segmentation capability.Second, the action reorganization comparison between with and without manifold metric learning on Weizmann dataset is performed to show the manifold metric learning performance.The action recognition accuracy without and with manifold metric learning is 92.8% and 95.6%, respectively, which indicates an improvement by 2.8%.Finally, the experimental results on KTH public video dataset verify the robustness of the proposed action recognition algorithm.The average recognition accuracy on KTH is 92.3%.On Weizmann and KTH datasets, the experimental comparisons indicated that the proposed algorithm is better than some state-of-the-art methods. The proposed action segmentation method based on analyzing the spreading area of actors' limbs can segment actions at the frame, where the limbs are closest to the body.Smoothing and the second filter step on the area function of the human bounding box enhance the action segmentation ability by anti-jamming.The segmentation method can obtain a desirable pre-processing effect.The multiple features fused effectively by the covariance matrix can describe the video action appropriately.The representation capability of the covariance matrix descriptor is further improved by adding optical flow, curl, and divergence information, which describe the motion direction information of the body parts of the body in detail.Evidently, the action recognition accuracy has been improved by using the manifold metric learning.The performance of the proposed action algorithm has been improved further by adding class-label information during the metric learning.All the experimental results show that the proposed video action recognition algorithm has high accuracy and desirable robustness.  
      关键词:action recognition;action segmentation;Manifold learning;metric learning;feature covariance;video ana lysis   
      2708
      |
      349
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56113225 false
      更新时间:2024-05-08
    • Local feature registration method of skull point cloud model

      Zhao Fuqun, Zhou Mingquan, Geng Guohua
      Vol. 22, Issue 8, Pages: 1120-1127(2017) DOI: 10.11834/jig.170003
      摘要:Point cloud registration with numerous applications, including 3D modeling, object recognition, scene understanding, 3D shape detection, and craniofacial reconstruction, is a significant and active research topic in computer vision.A 3D scanner can only obtain the partial 3D point cloud of one object associated with a single coordinate system from one viewpoint.The 3D point clouds captured from different viewpoints must be transformed into a common coordinate system according to rigid transformations to reconstruct the overall 3D shape.The 3D point cloud registration aims to compute the rigid transformation between 3D point clouds and automatically obtain the complete 3D shape of the object.Skull registration is an important step in craniofacial reconstruction.The correctness of its registration will directly affect the result of craniofacial reconstruction.Skull registration is the process of searching for one or more reference skulls from the existing skull database that is most similar to an unknown skull.The face of the reference skull can be used as the reference face of the unknown skull to provide a possible basis for craniofacial reconstruction.Thus far, most of the skull registration methods are feature-based method that contains two methods, namely, global and local feature-based methods.The extraction of feature descriptors is extremely important for registration.The global feature descriptor performs excellent discrimination for complete object representation, whereas the local feature descriptor is more robust against noise and outliers.The local feature descriptor is more suitable for the skull model registration than the global feature descriptor because of the complexity of skull point cloud model.Among the local feature descriptors, 3D point-based descriptor has been widely applied to represent a partial object because of its excellent generalization.The 3D point-based descriptor encodes the information of neighboring points of an interest point in a compact and distinctive approach.Then, the 3D points with similar local features can be identified from cluttered scenes through descriptor matching. A skull point cloud model registration method based on coarse-to-fine local features is proposed in this study to improve the accuracy and convergence rate of skull registration.The registration method of the skull point cloud model consists of two steps, namely, initial and fine registrations.In the initial registration, local feature representation and alignment are important steps in recovering a coarse rigid transformation.An accurate initial transformation can improve registration efficiency and reduce the optimization error in the following fine registration step.In the fine registration, an improved iterative closest point (ICP) algorithm is used to complete accurate registration.The detailed registration method is described as follows:First, the local features of the skull point cloud model, which consist of local depth and deviation angle between normal and point cloud density, are extracted.Second, the coarse registration of skull point cloud is achieved through local feature extraction, correspondence of local feature calculation, and outlier elimination.The skulls are coarsely aligned through the coarse registration step.Finally, an improved ICP algorithm that integrates Gaussian probability model and active iterative coefficient to the ICP algorithm is used to complete the fine registration of the skull point cloud model, obtaining an accurate registration of the skull point cloud model. In the experiment, the public point cloud models (i.e., rabbit and dragon point cloud models) and the skull point cloud model are used to complete the registration.The experimental results showed that the point cloud registration algorithm based on local features can complete accurate registration of either of the point cloud models.Moreover, the registration results are especially remarkable for the skull point cloud model.In the fine registration stage of the skull point cloud model, the registration accuracy and convergence rate increased by 30% and 60%, respectively, compared with the ICP algorithm.The registration accuracy is almost the same and the convergence rate has increased by 50%, compared with the probability ICP algorithm. The point cloud registration method based on local features can register public point cloud model accurately, as well as achieve extremely remarkable registration results for skull point cloud model.Thus, the point cloud registration method is an effective skull registration algorithm with high accuracy and convergence rate.  
      关键词:craniofacial reconstruction;skull registration;local feature;iterative closest point;Gaussian probability model;active iterative coefficient   
      2783
      |
      446
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111948 false
      更新时间:2024-05-08
    • Extracting small harbor areas in large-scene SAR images

      Zhou Qiang, Qu Changwen, Li Jianwei, Yao Libo
      Vol. 22, Issue 8, Pages: 1128-1134(2017) DOI: 10.11834/jig.170008
      摘要:SAR can obtain high-resolution images of targets in all types of weather, all day, and at long distance.It has significant value to civil and military fields.Harbor detection is also an important aspect for remote sensing ocean application research.Monitoring ocean targets using SAR images is also an important research direction.A long coastline has several small harbors, which consist of many docks and jetties.Large harbors are monitored all the time.By contrast, small harbors with less importance are neglected.However, these small harbors, where ships temporarily park, may have certain important targets at critical times.Thus, small harbors should also be automatically detected from large-scene SAR images.This study aims to detect small harbors in large-scene SAR images efficiently.Detection methods are complex because the coastline is always flexible.The shape of small harbors is irregular, and detectors can be easily deceived by the changing coastline.Moreover, a large scene may consist of many false small harbors.Thus, detection may result in high false alarm rate and low accuracy, thereby making it unusable. The small harbors have geometrical and radiation characteristics.The geometrical characteristics refer to the natural and artificial shape of small harbors;thus, the shape of small harbors varies.The radiation characteristics refer to the complexity of the SAR image of the harbors compared with optical images due to the interference of the speckle and other strong scatter.An extraction method for small harbor areas in large-scene SAR images is proposed in this study based on small harbor characteristics.The method is divided into three stages.First, a multiscale corner detector is adopted to extract potential areas.In different scales of a coastline, a corner represents varied information.An original scale image can extract several corners, some of which are small ships and docks, whereas some are speckled noise.These corners will disappear when the image is downsampled, and only the entire harbor corner can be observed.Therefore, the coastline characteristics of the harbor area in different scales indicate that the multiscale corner detection method can extract potential harbor areas.Second, an improved jetty detection method is adopted to extract a finer harbor area.Given the inevitable presence of harbor jetties, the detection result can significantly reduce the search range of the SAR image.Moreover, the detection result can further confirm the position of the harbor area.The harbor area based on jetty detection in the potential region is thus extracted.The candidate region of the jetty can be removed using the proposed method, and the exact harbor area can be obtained.Third, the closed-shoreline measurement method is used to eliminate false harbor identification.The presence of several small harbors can easily be a false target.The natural environment of the coastline is complo evaluate the proposed method, two typical harbor SAR images are processed, namely, the Radarsat-2 image of the Yantai Harbor and the TerraSAR-X image of the Visakhapatnam Harbor in India;both images have a resolution of 1 m.In comparison with the method proposed in the literature, the false alarm rate decreased from 10% to 6.6%, and the accuracy rate increased from 91.9% to 93.3%.However, the calculation process is complex, which results in the increase in processing time from 11.58 s to 13.26 s.Thus far, the best detection method has been previously proposed.The study establishes a harbor feature model and proposes a harbor detection method of remote sensing images based on this feature.First, harbor jetty extraction and discrimination is performed based on the instruction of geometric and topological features belonging to the harbor.Subsequently, the jetty key points are selected, and the coastline closure between key points is calculated according to the contextual and geometric features of the harbor.Finally, harbor detection is completed based on the closure principle.The method can easily produce false alarm without coarse-to-fine processing. A new detection scheme is proposed in this study.A complete flow of small harbor detection method is  
      关键词:SAR images;small harbor extraction;corner;dock;closure of coastline   
      2260
      |
      306
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111031 false
      更新时间:2024-05-08
    • Wang Chunyan, Xu Aigong, Zhao Xuemei, Jiang Yong
      Vol. 22, Issue 8, Pages: 1135-1143(2017) DOI: 10.11834/jig.160596
      摘要:Image classification is a significant part of image processing, and the accuracy of the classification result has a considerable influence on the following processes, such as feature extraction, object recognition, and image classification.High resolution remote sensing image can present detailed information of the interesting objects, which provides a sufficient basis for precise image classification.However, new questions and difficulties exist in the classification of high resolution image.These difficulties are caused by the increasing uncertainty of the class of pixels, as well as the complexity of correlation characteristics of different classes, which are attributed to the enhancement of spectral heterogeneity in the same object and spectral similarity in different objects.For example, distribution of an object in feature space may be asymmetric, or with multi-summits and distributions expressing different objects may contain many overlapping areas.Traditional fuzzy clustering algorithms, such as fuzzy c-means (FCM) algorithm, can effectively solve the problem introduced by the uncertainty of the class of pixels and obtain satisfactory classification results for low or medium resolution remote sensing images.On the contrary, traditional fuzzy clustering algorithms cannot deal with the influence of correlations between the class of pixels on the classification results in view of the preceding characteristics of high resolution remote sensing images.Fuzzy neural network has a powerful ability on approaching the numerical solution and describing the characteristics of uncertainty.The fuzzy neural network model treats the fuzzy membership function of a pixel as a hidden input to tackle with the uncertainty of pixels and determine the interrelation by solving the model parameters of the fuzzy neural network.Thus, the fuzzy neural network can solve the problem attributed to the uncertainty of subordination of pixels and the correlation between them in high resolution remote sensing images.To this end, this paper proposes a supervised classification algorithm for high resolution remote sensing images based on improved fuzzy neural network. A three-layer feed-forward neural network is designed.The network contains input, hidden, and output layers.Training samples acquired by supervise sampling are used to train the network and estimate hidden parameters.Classification of the detected image is carried out by considering pixels of the detected image as inputs.In the training process, the input layer is used to accept the gray values of the training samples.For each class, the training samples are used to calculate the histogram.The input value is any existing gray level in the training samples, and the expectation output is the histogram frequency of the corresponding gray level in its class.If the gray level is not contained in the training data, then its expected frequency is zero.The function of the input layer is to transfer the data directly to the hidden layer.That is to say, the input and the hidden layers have no parameters.A fuzzy membership function in the hidden layer is defined for each node, which is the Gaussian membership function in the proposed algorithm.The fuzzy operations are performed, wherein the number of nodes is equal to the number of classes.The function is the uncertain expression of the membership degree of the input variables.The input of the output layer is the linear combination of the output of each neural node from the hidden layer.The number of neural node is equal to the number of classes, and the active function is the custom piecewise linear function.The defined active function should satisfy the following constraints.When the linear combination of the membership function of each training data falls between zero to one, the output membership function after the training process remains;When the linear combination of the membership function of each training data is less than zero, the output membership function after the training process is zero;When the linear combination of the membership function of each training data is greater than one, the output membership function after the training process is the maximum of the frequencies present in the histogram.Consider the frequencies of all the training data as the expected output and estimate the corresponding parameters through gradient descent algorithm, including the coefficients of the membership function, mean, standard deviation, as well as the weights and offsets in the summation layer.Finally, the fuzzy algorithm segments the images based on the maximum membership function. The proposed algorithm, Gaussian membership function algorithm, maximum likelihood algorithm, and FCM algorithm are performed on high-resolution synthetic and real remote sensing images.The fitting results of histogram from the Gaussian membership function and the proposed algorithm are displayed along with the classification results and the precision evaluation index.Qualitative and quantitative experimental results demonstrate that the proposed algorithm can characterize the asymmetry distribution exhibited in the high resolution remote sensing images and is a better fit than the Gaussian membership function algorithm.Moreover, the proposed algorithm can obtain higher accuracy results over traditional classification algorithms. This paper proposes an improved fuzzy neural network supervision classification algorithm for high resolution remote sensing images based on the Gaussian membership function.The proposed algorithm establishes a Gaussian membership function in homogeneous regions to characterize the uncertainty of pixels and designs a fuzzy neural network model to represent the relationships between different classes.This algorithm also solves the problems attributed to high resolution remote sensing images.The fitting results on the histogram and the accuracy of the classification decision are improved.Qualitative and quantitative analysis demonstrate the ability of improving the accuracy in image classification results of the proposed algorithm through improving the fitting ability on complex distributions.Although the proposed algorithm can improve the quality of the model and the accuracy of classification results, the following problems remain.First, pixel-based image classification algorithms are sensitive to noise and outliers and cannot distinguish pixels with similar features, such as water and the shadows, without considering the spatial correlation of pixels.Therefore, spatial relationships between pixels should be considered in the following research.Second, texture features, which are important for image classification, are not involved.In future work, we will attempt to study and construct the texture features of high-resolution remote sensing images.The texture features of the detected image will be employed to develop an algorithm suitable to most kinds of images.  
      关键词:high resolution remote sensing image;image classification;fuzzy neural network;Gaussian membership function;supervised learning;histogram fitting   
      2471
      |
      365
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111635 false
      更新时间:2024-05-08
    • Zhang Wei, Zheng Ke, Tang Ping, Zhao Lijun
      Vol. 22, Issue 8, Pages: 1144-1153(2017) DOI: 10.11834/jig.170139
      摘要:Monitoring land cover information is fundamental for environmental change studies, land resource management, and sustainable development;it plays an important role in the global resource monitoring and change detection.Improving the land cover classification accuracy on moderate resolution remote sensing images is considerably significant.For land cover classification, some researchers have combined the texture features with spectral values.The classification accuracy has improved;however, the ability of using texture features to improve the classification accuracy is still extremely limited.Extracting expressive features is the key to remote sensing image classification.In recent years, the deep convolutional neural network (CNN) has made a great breakthrough in the fields of image classification, object detection, and image semantic segmentation;it has a remarkable capability for feature learning and representation compared with the traditional machine learning methods.This feature extraction capability of CNN in improving the accuracy of land cover classification should be investigated. Given the superior characteristics of CNNs, this paper conducts an exploratory research on the classification of moderate resolution remote sensing images using features extracted by deep CNN.In detail, the GF-1 multispectral remote sensing imagery with 16 m resolution is used as experimental data, and the pretrained AlexNet is used for feature extraction using the support vector machine (SVM) as the classifier.The proposed method has three steps:1) Preprocessing:The image that is inputted to AlexNet must include three bands, because the pretrained AlexNet is trained on ImageNet.Thus, the principal component analysis is applied to the GF-1 four-band multispectral image to derive the first three principal components.2) Deep feature extraction:For each pixel, the surrounding image patch with a fixed window size is initially upsampled to 224×224 by the nearest neighbor interpolation method to conform to the required input size of the AlexNet model;the mean image of train set is then subtracted for normalization and is finally fed into the pretrained AlexNet to extract the deep features of convolutional layers or full-connected layers.3) Classification:The extracted deep features are fed into the SVM classifier with a linear kernel.The capability of feature representation for various layers of AlexNet and the effectiveness of different window sizes are evaluated and analyzed.Furthermore, the comparisons with classification results, which are obtained by spectral values and spectral-texture features, are conducted to assess the potential of the proposed method. Experimental results include the following:1) Features of full-connected layers are more effective than those of convolutional layers, and the best feature extraction layer of AlexNet is the 6th full-connected layer.2) The window size for feature extraction has an influence on classification results.Accuracy initially increases and then decreases in the end with the increase of size.The best window size is 9×9.3) The classification accuracy obtained by using deep features of AlexNet is higher than those performed using only spectral values and spectral-texture features for classification.4) The proposed method has a drawback because of the ReLU nonlinear activation function and max pooling in AlexNet;moreover, a phenomenon of expansion or shrinkage of actual outline exist for the classes with extremely high or extremely low spectral values. In conclusion, the deep CNN can extract accurate features for land covers and obtain high classification accuracy, providing a valuable reference for land cover classification.In the future work, some further measures, such as selecting effective activation function and pooling method to build and train a new CNN model for land cover classification, must be considered to improve the proposed method.  
      关键词:convolutional neural network(CNN);AlexNet;feature extraction;land cover classification;support vector machine(SVM)   
      5884
      |
      433
      |
      13
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111464 false
      更新时间:2024-05-08
    • Multi-viewpoints remote sensing images registration using mixed features

      Wu Fangqing, Yang Yang, Pan Anning, Yang Kun
      Vol. 22, Issue 8, Pages: 1154-1161(2017) DOI: 10.11834/jig.160542
      摘要:Remote sensing image registration is a key technology in the field of remote sensing image processing and has been widely used in the military and civil fields, such as natural disaster damage assessment, resource census, military damage assessment, environment monitoring, and ground target identification.Remote sensing image registration is the fundamental task in image processing that aligns two or more images of the same scene (i.e., reference and sensed images), which can be multitemporal (taken at different times), multisource (derived from different sensors), and multi-view (obtained from different viewpoints).It aims to acquire the geometric transformation relationship between the reference and sensed images accurately.Non-rigid geometric distortion occurs among multi-viewpoint remote sensing images because of the changes in aerial view and the complexities of the spatial distribution and geometric shapes of terrestrial objects, especially in roads, rivers, viaduct, buildings, mountains, and other linear areas, thereby increasing the difficulty of remote sensing image registration.Few methods have attempted to solve the distortion problem in multi-viewpoint remote sensing image registration;however, most of these methods mainly focused on solving the rigid and affine geometric distortion problems.Furthermore, only single feature was applied for point correspondence estimation. A new multi-viewpoint remote sensing image registration algorithm is proposed to address the multi-viewpoint remote sensing image registration problem and achieve high registration accuracy.The proposed algorithm includes four steps:1) The scale-invariant feature transform (SIFT) detector is employed to extract the feature points from the reference and sensed images.Two sets of extracted feature points are considered as salient image features and are used to represent putative feature correspondences.2) Global and local mixed features are identified based on the non-rigid point set registration algorithm for registering the two aforementioned sets of SIFT feature points, which consists of an alternating two-step process (i.e., correspondence estimation and transformation update).Global and local geometric structure features are initially defined using the shape context and the local distance descriptors for measuring global and local structural differences between SIFT feature point sets, respectively.The two features are then combined to form global and local mixed features based on cost matrix.This step provides a flexible way to estimate point correspondences by minimizing global or local geometric structural differences using a linear assignment solution.An annealing scheme is designed to change the cost minimization from local to global gradually and the thin plate spline (TPS) transformation from rigid to non-rigid during registration to improve the correspondence estimation and enhance the interaction between the two steps.3) After point correspondences are calculated in Step 2, the backward approach and the TPS transformation model are applied to estimate the image transformation model between the reference and sensed images.4) After estimating the TPS transformation model, the transformed image is calculated using the TPS mapping functions that are constructed in the previous step.A bicubic interpolation algorithm is used in the sensed image on a regular grid.Each pixel from the sensed image can be directly transformed using the estimated mapping functions and the backward approach.Image interpolation occurs in the sensed image on the regular grid.Neither holes nor overlaps (due to the discretization and rounding) will occur in the output image (i.e., the transformed image) by using the backward approach.The proposed algorithm can enhance the registration accuracy for multi-viewpoint remote sensing images with non-rigid distortion by improving the geometric structure feature descriptions and utilizing the complementary relationship between the global and local geometric structure features of SIFT feature point sets. The main process of the proposed algorithm is implemented in Matlab, and the Jonker-Volgenant algorithm in C++ is considered as a Matlab mex function.Two series of remote sensing image data are used to evaluate the performances of the proposed algorithm.These remote sensing image data are 1) the satellite image pairs of Google Earth with different viewpoints, which contain nine image pairs taken from London, Las Vegas, Paris, Mekong River, Tokyo, Hawaii, Beijing, Niulanjiang River, and New York;and 2) aerial image pairs of unmanned aerial vehicle (UAV) with different viewpoints, which contains five image pairs taken from the Yunnan Normal University, Chenggong Campus.Moreover, the proposed algorithm is compared against three state-of-the-art methods (i.e., SIFT, speeded-up robust features, and coherent point drift.For quantitative comparisons, a reliable and fair criterion was employed to evaluate the performances of all methods.Nine pairs of corresponding points between the transformed and reference images are manually determined as ground truth, and all corresponding points were well-distributed and selected on the easily identified landmarks in the transformed and reference images.The registration error is defined as the root of mean square coordinate error among the determined corresponding points.Results show that the proposed multi-viewpoint remote sensing image registration algorithm has effectively improved the registration accuracies of SIFT feature point sets and provided accurate image registration results in multi-viewpoint remote sensing images with non-rigid distortion. In this study, a multi-viewpoint remote sensing image registration algorithm is realized by mixing global and local geometric structure features of SIFT feature point sets.The proposed algorithm includes 1) SIFT feature extraction, 2) feature point set registration, 3) image transformation model estimation, and 4) image transformation.The main contributions of this work are shown by the successful solution of the problems of remote sensing image registration by the proposed four-step algorithm in moderate ground relief variations and imaging viewpoint changes;the algorithm also shows the best alignments in most cases compared with the three state-of-the-art methods.The experimental results on the satellite images of Google Earth and UAV images show that the proposed algorithm can accurately register multi-viewpoint remote sensing images with non-rigid distortion and is applicable to address the problems in general multi-viewpoint remote sensing image registration.  
      关键词:multi-viewpoints;remote sensing;non-rigid distortion;mixed feature;image registration   
      3681
      |
      324
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112473 false
      更新时间:2024-05-08
    • Li Qing, Li Yu, Wang Yu, Zhao Quanhua
      Vol. 22, Issue 8, Pages: 1162-1174(2017) DOI: 10.11834/jig.160588
      摘要:Gestalt is a psychology term that means unified whole. This term refers to the theory which describes how people tend to group visual elements when certain principles are fulfilled. Gestalt concludes principles, such as similarity, row continuation, proximity, and closure, which are based on the overall cognition of an object which can be obtained through perceiving its parts. However, quantizing these principles in mathematics is difficult in practice because they are abstract psychology concepts. Actually, only few principles, such as similarity and proximity, are used in the literature, and these principles are interpreted in a simple way. Building extraction based on Gestalt laws is challenging and demanding. In this paper, the idea of Gestalt is applied to designing a building extraction algorithm from remote sensing image by testing the candidate edge points, defining the relationship of edge points according to Gestalt principles, finding the edge points, and fitting edges of buildings. This paper proposes a new method for extracting buildings from high resolution remote sensing image based on Gestalt rules. First, the scale invariant feature transform algorithm is used to extract the key points as candidate edge points in a given remote sensing image, wherein each key point is respectively attributed with features, such as position, orientation, scale, and assessment. Gestalt space, which is a 4-D domain containing the preceding features mentioned, is established, and the subsequent operations are completed in the space. The Gestalt principle known as row continuation determines whether or not all candidate edge points are on the edges of buildings. Each candidate edge point involved in the operation is regarded as an integral Gestalt, and a new integral Gestalt is obtained after the operation according to the principle of row continuation. If the assessment of the new integral Gestalt is larger than the given threshold, then the integral Gestalts, namely the candidate edge points involved in operation, fulfill the rule of row continuation and are all on the edges of buildings. Consequently, the set of edge points can be obtained. Finally, the edges of buildings are fitted with different sets of edge points, and silhouettes of buildings are formed by combining their extracted edges. Experiments are carried on with the WorldView-2 images compared with other building extraction algorithms, such as mean shift algorithm, multi-scale segmentation, and region merge-based building extraction algorithm to quantitatively and qualitatively test the proposed method. The proposed method can better correctly and completely extract the buildings from the image. We use the evaluation measures extensively accepted for building extraction to quantify the accuracy of the building extraction results. The extracted buildings and manually delineated buildings are compared pixel-by-pixel. All pixel in the image are categorized into four types: true positive(), manual and automated methods label the pixel belonging to the buildings; true negative(), manual and automated methods label the pixel belonging to the background; false positive(), the automated method incorrectly labels the pixel as belonging to a building; false negative(), the automated method incorrectly labels a pixel truly belonging to a building. The number of pixels that fall into each of the four categories mentioned are determined. In addition, the branching factor(), miss factor(), detection percentage(), and quality percentage() are computed. The is a measure of the commission error in which the system incorrectly labels background pixels as buildings. The measures the omission error in which the system incorrectly labels building pixels as background. The denotes the percentage of building pixels correctly labeled by the automated process. The measures the absolute quality of the extraction and is the most stringent measure. The and of the proposed method are greater than the comparison method, and all of the proposed method is greater than 95%. The proposed method is more accurate than other building extraction methods. Moreover, all experiments demonstrate the feasibility and effectiveness of the proposed method. Experimental results show that the proposed method can well extract the buildings from high resolution remote sensing image. Thus, the proposed method is demonstrated to be a feasible and effective method for building extraction. For remote sensing images, the proposed method is suitable for edge extraction with linear features of the building on an image. The accuracy will be decreased with low resolution when the proposed method is used for building extraction. Therefore, the suggested resolution of the experimental image is above five meters.  
      关键词:gestalt;building extraction;scale invariant feature transform(SIFT) algorithm;row continuation;edge extraction   
      3487
      |
      454
      |
      8
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110496 false
      更新时间:2024-05-08
    0