摘要:Content-based image retrieval uses features extracted from an image to retrieve similar images accurately and with low memory and time consumption from a large-scale dataset.Scale-invariant feature transform (SIFT) is robust to translation,scaling,rotation,viewpoint changing,and occlusion,as well as performs fast extraction.Thus,SIFT is widely used theoretically and practically.However,SIFT has some shortcomings,such as a lack of spatial geometric information and color information.Convolutional neural network (CNN) has good domain transferability,and deep features from pre-trained CNN can be applied to various domains.CNN deep features have recently attracted considerable attention and exhibit superior performance over SIFT.However,contrary to the shortcoming of SIFT,CNN features lack shallow information.Thus,SIFT is usually fused with CNN features and other shallow features. This report reviews the recent advances and challenges in image retrieval in the world and in China,including shallow feature,deep feature,and feature fusion.Future development trends are also explored.For shallow features,we mainly review SIFT and its variants,the encoding methods,and the development of these methods.For deep features,we divide the descriptors of the features into different categories according to the type of CNN layer that was used:fully connected layer,convolutional layer,and softmax layer.Many features can be extracted from a convolutional layer,and many pooling methods are proposed. The encoding methods of SIFT mainly include bag of features (BOF),vector of locally aggregated vectors (VLAD),Fisher vector (FV),and triangulation embedding (TE),and they mostly consist of two steps:embedding and aggregation (or pooling).For CNN features,features from the fully connected layer of CNN are typically used because of their good transferability and accuracy.However,deep features from the convolutional layer have become an increasingly attractive option recently because the convolutional features can be effectively combined with a variety of pooling methods such as sum-pooling,max-pooling,VLAD-pooling,and FV-pooling,and they perform well in the domains of image classification and retrieval.The fusion methods can mainly be divided into five types:concatenation,kernel fusion,graph fusion,index-level fusion,and score-level fusion.Concatenation,kernel fusion,and index-level fusion work directly on different features,and graph fusion and score-level fusion work on the retrieval results of different features.Fusion uses complementary different features and can improve image retrieval accuracy effectively. SIFT and CNN feature are complementary to each other:SIFT contains rich low-level information,and CNN features contain rich high semantic information; SIFT has a good property of invariance,which is the shortcoming of CNN features.Fusion is an effective way to maximize image information.However,time and space consumption will inevitably increase,and a good algorithm that can be used to distinguish good features from bad ones is yet to be studied.At present,the generalizability and geometric invariance of CNN features are inferior to those of SIFT; this issue continues to be a challenge for image retrieval researchers.The generalizability of CNN features is limited by the domain and statistic difference between the source task (usually ImageNet) and the target task.Fine tuning is a good strategy to solve this problem; however,this approach needs an additional labeled dataset similar to the target task.To enhance the geometric invariance of CNN,the CNN descriptor space consumption and extraction time will inevitably increase,and only scale invariance is usually considered for simplicity,ignoring other aspects of invariance.Moreover,the number of CNN features from one image is usually much smaller than that of SIFT; thus,insufficient information for encoding will be captured.The most commonly used CNNs are designed for image classification tasks and not for image retrieval.However,image retrieval is a more fine-grained domain; a relevant algorithm needs to find similar images,not just the images from one class.Thus,a CNN trained for image retrieval may be a good future research direction.More work is still needed to strike a better balance among generalizability,invariance,memory consumption,and extraction time for an effective and efficient image retrieval descriptor.
摘要:Sparse denoising algorithm is advantageous in optimizing the denoising effect but is inefficient because of its complex matrix operations in the sparse decomposition and dictionary training stages.Although classification is applied in the dictionary training stage,the method can still be enhanced.An improved algorithm is proposed to solve problems of inefficiency caused by complex matrix operations and global searching of the dictionary in the sparse decomposition stage. First,a local orthogonal matching pursuit algorithm,which introduces dictionary clustering based on orthogonal matching pursuit to generate sub-dictionaries,is proposed.Another novel element of this work is that a neighbor-prioritizing method,which selects optimal sub-dictionaries as matching space to sparse decompose,is proposed to optimize the denoising effect.Finally,the content cluster of the noisy image is denoised using the neighbor local K-SVD algorithm based on the clustering-based denoising method. Experiments on several images in the USC standard image library show that the proposed method leads to better denoising effect than that of other algorithms.The peak signal-to-noise ratio of the proposed algorithm is 1.53 dB higher than that of the K-SVD algorithm,0.72 dB higher than that of the BM3D algorithm,and 0.5 dB higher than that of the CSR algorithm on average.The running time of this algorithm is faster than that of the original algorithm. The proposed algorithm improves the effect and efficiency of gray image denoising and presents certain popularization value on gray images with much detail and texture.
摘要:The rapid development of face recognition has made obtaining face feature information a major research area.Local feature representation can depict the local feature information of the face in much detail.Therefore,local feature representation is widely applied in facial feature extraction.At present,the local binary pattern algorithm is the most widely used.However,LBP still exhibits some defects that are insensitive to illumination changes and noise.The local directional pattern algorithm obtains the edge gradient value by convolving the local neighborhood,which is more stable than pixel value of LBP.LDP has more advantages than those of LBP.Therefore,LDP is developing rapidly in recent years.However,the traditional LDP algorithm cannot exert a good equilibrium effect on three aspects,i.e.,sufficient feature extraction,robustness to noise and illumination,and short recognition time,at the same time.This study presents a novel approach based on double-space local directional pattern (DSLDP) algorithm for face recognition to solve the previously mentioned problems.The DSLDP algorithm can exert a good balance effect on three aspects,i.e.,feature extraction,stability,and recognition time. First,each 3×3 neighborhood pixel of the facial image gains eight edge response values by convolving the local neighborhood with eight Kirsch template operators.Eight Kirsch masks represent the eight directions of the eight sides,i.e.,east,west,south,and north corresponding to the linear edge and northeast,northwest,southwest,and southeast corresponding to the line edge.Then,the difference of each pair of neighboring edge response values is calculated to form eight new difference directions.DSLDP utilizes the direction of the largest absolute value edge gradient value of each sub-neighborhood.The two directions are encoded into a double-digit octal number to produce the DSLDP code.Finally,the face descriptor is represented using the global concatenated histogram based on the DSLDP map extracted from the face image,which is divided into several sub-blocks.Sub-blocks are weighted by information entropy.The face dimensions are reduced by principal component analysis.The nearest neighbor classifier is used to classify the faces.Then,the identification results are obtained. The face representation method is compared with recently developed algorithms,such as LDP,significance local directional patterns,enhanced local directional patterns,local directional number patterns,difference local directional patterns,center of symmetry local directional patterns,and gradient center of symmetry local directional patterns.The experimental results of the ORL,AR,and CAS-PEAL face databases show that the DSLDP algorithm exhibits better recognition performance than that of the other algorithms.The average recognition rate of the DSLDP algorithm is 97.82% on the ORL database with five test samples.The average recognition rate of the DSLDP algorithm on the AR illumination,expression,shelter A,and shelter B databases are 98.00%,98.33%,99.33%,and 98.33%,respectively.The average recognition rate of the DSLDP algorithm on the CAS-PEAL illumination,expression,shelter databases are 99.33%,95.33%,and 90.00%,respectively. 1) In the aspect of face feature extraction,the method considers not only the change in the inner edge response values but also the change in the outer edge response values.The method also fully utilizes face information extracted in the strength and the gradient spaces.2) In terms of stability,compared with other single face recognition algorithms based on LDP,the DSLDP algorithm only considers the maximum of edge response and difference edge response values.The DSLDP algorithm highlights the main edge gradient information and avoids the interference of unimportant information.Therefore,the algorithm shows strong robustness to the variations in posture,illumination,noise,and facial expression.3) In terms of recognition time,the DSLDP code is made up of a double-digit octal number;thus,the feature model number is reduced to 64 and the recognition time is obviously decreased.Therefore,the LDP algorithm can exert a good balance effect on three aspects,i.e.,face feature extraction,stability/robustness,and recognition time,at the same time.The algorithm is proved to be effective.
关键词:local directional pattern;double space local directional pattern;kirsch operator;information entropy;principal components analysis;nearest neighbor classifier;face recognition
摘要:Vehicle detection and attribute recognition are the basic tasks in an intelligent traffic system (ITS),which aims to extract the key features of target vehicles.Most solutions separate the key features into several individual modules,such as vehicle detection,vehicle color recognition,and vehicle type recognition.However,such type of solution suffers from many problems under the practical scenario.First,the coupling problem between detection and recognition algorithms increases the complexity of algorithm designation.Second,deep learning-based algorithms are data-driven methods;thus,the algorithm designer should collect data for every single function module for training.However,data collection is costly and time consuming.Moreover,the more the ITS modules ITS,the higher the cost of the computational and communication resources.We propose a unified framework,which is integrated with the vehicle detection and attribute recognition functions,to settle these issues. Vehicle detection and attribute recognition tasks can be viewed as a classification problem between background and foreground regions.Color and type are two important holistic features of a vehicle.Combining the two features as the foreground region label can enlarge the diversity between foreground and background regions.The more the diversity between foreground and background regions,the lesser the false positive and true negative detection cases.We utilize the scalability of the multitask learning algorithm to finish vehicle attribute recognition and detection tasks at the same time to implement this idea.Specifically,the multitask paradigm is added on top of the region-based detection algorithm.At the training phase,instead of deploying the raw multitask learning algorithm,we integrate the online hard example mining algorithm into our framework to cope with the negative effect caused by the long-tail phenomenon.At the prediction phase,the proposed framework outputs the vehicle location,vehicle color,and vehicle type information in forward pass. We construct a large-scale on-road vehicle dataset,which contains 12 712 images and 19 398 vehicles,in verifying the proposed vehicle detection and attribute recognition framework.In this image dataset,every vehicle in the image is annotated with a bounding box and its corresponding type and color label.We achieve a mean average precision of 85.6%,which is better than that of the SSD and Faster-RCNN algorithms.For the recognition tasks,we achieve 91.3% and 91.8% accuracy for color and type recognition,respectively. Type and color are two important vision cues for vehicles.Thus,integrating these attributes into the detection algorithm can boost the detection performance to another level and result in a good recognition performance.Moreover,a highly integrated system can make the ITS computationally efficient.
关键词:vehicle detection;vehicle attributes recognition;hard example mining;long-tail distribution;multi-task learning
摘要:Template matching algorithms consider all possible transformations,such as rotation,scaling,and affine transformations.Template matching algorithms are commonly used to find the corresponding image regions between image pairs.However,the two following issues negatively affect its accuracy.1) When the photography baseline increases,the effective information of the area to be matched in the target image decreases and the matching accuracy gradually decreases.2) The choice of matching areas significantly influences the matching accuracy;thus,the matching results may differ considerably by selecting different regions as matching templates.We propose a template selection and matching algorithm for image matching to resolve these problems. First,sampling vector normalized cross-correlation (SV-NCC) is proposed to measure the regional consistency between two images by multichannel features.The proposed method discards two parameters,i.e.,Δ and ,which play an important role in suppressing the interference of light and noise in the CSAD method but reduce matching accuracy.The NCC method is introduced using the "mean" and "cross-correlation" to inhibit the effect of light and noise to solve this problem.Template matching is conducted in Lab color space to better reduce the influence of the change in illumination.When computing the color similarity of two images,if the Euclidean distance of two color components
摘要:Fine-grained visual recognition requiring domain-specific and expert knowledge is a difficult issue in computer vision.This study addresses the problem of fine-grained visual categorization on accuracy and speed.This task is full of challenges because all the objects in a fine-grained dataset belong to the same level category,with some fine,subtle differences between classes.These subtle differences cannot be easily distinguished by ordinary people who do not have expert knowledge.Moreover,the features cannot be easily extracted and classified into the correct category by the computer.This study uses the convolutional neural network to extract features to predict the fine-grained class and thus improve recognition accuracy.The task includes object detection,object recognition,and object classification. In this study,we propose a convolutional neural network architecture based on deep region networks.We use this architecture because the convolutional neural network can improve the extraction and classification accuracy significantly.The deep architecture can also extract fine and coarse features that are useful for classification.First,we extract deep features by use of the feature extraction network to create a feature map.Every convolution layer extracts the feature by feature weight matrix,which is set by forward and back propagation algorithms to minimize the loss function.The proposed feature extraction network possesses two different architectures,i.e.,VGG and Residual networks.We use the VGG 16-layer and Res 101-layer deep networks as the feature extraction networks.We share the deep features extracted by the feature extraction network to the subsequent network.Second,we use the region proposal network architecture to implement the region proposal step.The feature map extracted by the feature extraction network is convoluted with a convolution kernel size of 3×3 with three different feature sizes and aspect ratios.Through convolution of the feature map,we derive 512 dimension vectors that can be used to conduct bounding box classification and regression.We determine the region of interest (RoI) if the class is yes.RoI pooling is used to ensure maximum pooling on the RoI.This pooling shares the network and avoids repeated computation.Third,the proposed regions are inputted into the region convolutional neural network to predict the class and score of every fine-grained classification and compute the regression function of the bounding box of object detection.Finally,the deep region convolutional neural network outputs four results.Two results are the classification results that the network predicted with maximum and minimum scores.The two other results are the regression results that include four values,i.e.,,,,and ,and two coordinates,i.e.,the lower left corner (,) and the upper right corner (,),which form a rectangular area. We use the CUB_200_2011 dataset,which is designed for fine-grained image recognition with public annotations,including class labels,object bounding boxes,and part locations,to conduct fine-grained recognition of birds.The dataset contains 200 fine-grained categories of birds and a total of 11,788 pictures.Only the class label and object bounding box annotations are used in our evaluation.In this work,the proposed networks are trained and tested to achieve superior performance.For the entire bird,the VGG16+R-CNN(RPN) network achieves the Top 1 accuracy of 90.88% and the Res101+R-CNN(RPN) network achieves the Top 1 accuracy of 91.72%.Meanwhile,the VGG16+R-CNN(RPN) network achieves the Top 5 accuracy of 98.15% and the Res101+R-CNN(RPN) network achieves the Top 5 accuracy of 98.24%.The Top 5 accuracy of the two networks is more than 98%.We also conduct experiments on certain parts of the bird,such as the head.For the head of the bird,the VGG16+R-CNN(RPN) network achieves the Top 1 accuracy of 90.70% and the Res101+R-CNN(RPN) network achieves the Top 1 accuracy of 91.06%.Meanwhile,the VGG16+R-CNN(RPN) network achieves the Top 5 accuracy of 98.04% and the Res101+R-CNN(RPN) network achieves the Top 5 accuracy of 98.07%.In this study,we analyze the effect of certain parts of the bird,such as beak,head,belly,leg,and foot,on the performance of object detection,object recognition,and object classification. This study proposes a network architecture based on deep region convolutional neural networks that achieve superior performance compared with other models in terms of object detection,object recognition,and object classification,although fine-grained visual recognition is difficult for ordinary people and other network architectures.Our method exhibits a high accuracy and a good performance and does not need extra data to train,which make it robust and applicable to various datasets.Experimental results show that the local information is useful for fine-grained image recognition.The proposed model is applicable to object detection and fine-grained image categorization.The experimental results also show that the local information,such as head,is useful for fine-grained image recognition.
关键词:fine-grained;deep region network;convolution neural network;classification of birds;residual network
摘要:The rapid popularization of hardware devices,such as intelligent terminals,has made makes it convenient to interact with the computer,thereby enabling people to initially enjoy the advantages of natural human-computer interaction.The increase in successful applications of many handwritten recognition systems has made sketch-based interactive technology with sketch and handwritten recognition a major research area.Teaching through computer aids has become the most commonly used teaching method and can effectively help students understand knowledge in the textbook better.Kinematics is a branch of theoretical mechanics and is the basis of various engineering disciplines.In kinematics courses,teachers usually use a handwritten or force diagram on the blackboard to help students understand the abstract principles or formulas.In the teaching process,students study most of the movements and the results through imagination;thus,they cannot intuitively feel and understand theoretical knowledge.Therefore,the abstract nature of theoretical knowledge and the difficulty in concluding the diagram lead to learning difficulties.At present,many achievements in the use of computer aids in physics courses have been made,but most of these achievements are based on images,videos,and other materials.Thus,teachers can only teach through display,which lacks interaction,and the teaching content and form are inflexible.Modern computer technology should be applied to solve the problems encountered in physics teaching.On the basis of the traditional teaching form,the advantages of modern computer technology should be fully utilized to construct an intuitive and flexible education model,which not only can help teachers explain the abstract physics theory but also can provide students with a flexible and active learning platform.Therefore,we provide a method to recognize the mechanism of the handwritten kinematics diagram and simulate the motion of the mechanism. On the basis of the analysis on the common mechanisms in the teaching of kinematics,this study summarizes 16 common links.On the basis of the analysis on these links,we divide the constituent primitives of the link diagram into seven categories,namely,straight line,broken line,circle,point,arc,rectangle,and triangle.Then,we propose a recognition algorithm for links based on the recognition of the handwritten primitives.In the process of link drawing,each stroke is recognized and the types of strokes are recorded.First,the stroke geometry is extracted and used in stroke recognition to identify the type of stroke.From the analysis on the users' habit of drawing the diagram,we can summarize the link with the composition of and the relationship among the primitives to establish a link library.When the link drawing is finished,we can derive a list of the types of strokes.Then,we compose the list of the types of strokes and the relative positions between strokes to recognize the links through matching with the link library.This study establishes the constraint relationship,which conforms to the users' intention and implements position correction to the link in the organization diagram.Finally,the simulation of the movement process of the mechanism is achieved on the basis of a 2D physics engine. The recognition of 16 types of common links in kinematics teaching was achieved.The recognition accuracy rate of most of the links was 90% or more,among which the recognition accuracy rate of ball,rod,and line reached 100%.Meanwhile,the average accuracy rate of recognition was 93.25%.Moreover,the average accuracy rate of recognition of seven types of primitives commonly used in the links was 94%. This study summarizes 16 types of links used in the teaching of kinematics,analyzes the positional relationship between the links,and determines the position constraint relationship among the links.According to the elements of the links and the relative relationship between the primitives,we design a link library for the kinematic mechanism diagram.In this study,we propose a link recognition method based on primitive recognition,and this method can be used to recognize the links by matching the composition of and the relationship among the primitives.The relative position and connection relationship of the primitives in the link are calculated and matched with the link library.On the basis of the position constraint relationship of the links in the schematic of the mechanism,the attribute parameters of the position constraints in each link are analyzed and the position correction rules between the links are designed.Position correction is achieved for the users' handwritten input.Experimental results show that the method can recognize the handwritten kinematic links and correct the constraint relationship between these links.By correcting the position of the links,we obtain the motion diagram of the mechanism that conforms to the users' intention.In addition,we design the simulation platform of the motion process and simulate the movement process.This method exhibits high recognition rate for any kinematic link inputted by different users in the sample database and supports the user-defined combination of links to form the personalized organization and simulate the movement process.
关键词:sketch recognition;kinematics diagram;position constraint;relative position correction;motion simulation
摘要:The further expansion of the current video surveillance market has provided video surveillance is showing with a large amount of data,which are difficult to store and process.In video anomaly detection of the monitoring scene,identifying the abnormal events rapidly and accurately is particularly important.In this study,the optical flow features extracted from the video are taken as examples.On the basis of the association of the space structure of the optical flow features,the relationship of the optical flow features is preserved after multiscale transformation of the graph structure.Our method can achieve the purpose of rapid anomaly detection by reducing the number of optical flow features. Therefore,the video anomaly detection method based on multiscale transformation of the graph structure is proposed. Aiming at the relevance of the spatial structure of the optical flow features in the video,constructing the network graph structure of the optical flow features is proposed.Under the relevant constraints,the iterative scale transformation of the graph structure of the optical flow is used to reduce the number of optical flow features effectively to complete feature optimization in video anomaly detection.The process of scale transformation of the graph structure is described as follows.First,we use the polarity of the largest eigenvector of the Laplacian matrix of the graph structure of the optical flow features to filter the vertex and complete the graph downsampling.Then,we use Kron reduction to construct the inner connection between the vertices and reconstruct the graph structure of the optical flow features.Therefore,after the multiscale transformation of the graph structure,we can generate a graph structure of the optical flow features with a small number of vertices that are closely related to the spatial features.Thus,the optimization of the optical flow features can be achieved and the subsequent anomaly detection can become rapid and efficient.In video monitoring,the multiscale transformation of the graph structure helps store and process the feature data of the current video monitoring. Experimental results show that this method can improve the detection speed of the video anomaly detection algorithm only when the detection accuracy is slightly decreased.In the UMN dataset,when the scale number of the graph structure is only one,the detection accuracy is reduced by 3.2% but the detection speed is improved by 19.1%.Thus,the detection speed of the entire video set is significantly affected.When the scale number is two,the detection accuracy is decreased by 7.3% and the results cannot meet the actual requirements.Thus,the effect of anomaly detection can be achieved when the scale number is only one.In the Web dataset,when the scale number of the graph structure is only one,the detection accuracy is reduced by 1.9% but the detection speed is increased by 32%.When the scale number is two,the detection accuracy is reduced by 4.8% but the detection speed is improved by 51%.Therefore,on the basis of the detection accuracy and detection speed,we select the scale numbers of one or two.However,with the increase in the scale number,the detection effect cannot meet the requirements.From the two different experiments conducted to verify the method proposed in this study,we can conclude that the multiscale transformation of the graph structure exhibits a good performance in video anomaly detection.When the detection accuracy is slightly reduced,the multiscale transformation of the graph structure can obviously improve the detection speed of video anomaly detection. In this study,we use the irregular network graph structure to fully describe the spatial relationship between features.After the multiscale transformation of the graph structure,we can maintain a strong spatial relationship between the features.In different video surveillance scenes,we select the appropriate scale number in accordance with the detection accuracy and detection speed to achieve rapid anomaly detection.
摘要:Target tracking plays an important role in computer vision and is widely used in intelligent traffic,robot vision,and motion capture.In actual scenes,the accuracy of target tracking is low because of the influence of illumination change,target deformation,partial occlusion,and complex background.Thus,avoiding the influence of these factors and improving the tracking accuracy of the algorithm are major issues in target tracking.Target model matching methods are widely used in the detection and tracking of moving targets.In recent years,many experts and scholars have proposed several excellent target model matching tracking methods.Babenko et al.proposed a target tracking method based on online multiple instance learning,and this method selects the appropriate number of positive and negative templates around the target to track.The authors also constructed a discriminant model to achieve tracking and updated the appearance model of the target in real time.Wang et al.proposed a superpixel tracking method,which extracts the target model from the background and forms a dividing target model.The authors also calculated the possible position of the target at the subsequent moment by use of the maximum a posteriori estimation and the pixel confidence map.Mei et al.proposed a tracking method based on sparse representation classification,and this method is used to solve the problem of sparse approximation.The authors also determined the final tracking result on the basis of the size of the reconstructed residuals.Bao et al.proposed a real-time sparse representation tracking method,which uses multiple target models and sparse representation classification.This method can improve the tracking effectiveness while maintaining a high tracking accuracy.Kalal et al.proposed a tracking method based on the combination of tracking,learning,and detection.This method is robust to the local occlusion and the target deformation.Oron et al.proposed a locally orderless tracking,which divides the target into a plurality of superpixel blocks and tracks the targets in the subsequent frames by matching the pixels.At the same time,the authors used a particle filter to restrain target model matching and thus ensure robust tracking.Traditional model matching and tracking methods are easily affected by the local occlusion of other targets and the complex background.Thus,a novel tracking approach based on the bidirectional optimization tracking method under foreground partition (BOTFP) is proposed to solve these problems. In the first frame during manual delineation of the target area,the color and texture features of the target region are extracted and used to establish the discriminant appearance model.Subsequently,the similarity between the local features of the test images and the appearance models is calculated using the bidirectional optimization similarity matching method to complete the model matching process.This study presents a foreground partition method,which can obtain accurate matching results,to avoid the interference of complex background and similar targets.Finally,an online model updating algorithm is proposed,which introduces the distance decision method.This algorithm can be used to determine whether a false match occurs,avoid the interference of similar targets in the foreground region,and ensure that the model is an accurate description of the target. Compared with that of other excellent tracking algorithms,the proposed target tracking algorithm can achieve the same or even higher tracking accuracy.The average center errors in video sequences of Girl,Deer,Football,Lemming,Woman,Bolt,David1,David2,Singer1,and Basketball are 7.43,14.72,8.17,13.61,24.35,7.89,11.27,13.44,12.18,and 7.79,respectively.The tracking overlap ratios in video sequences of Girl,Deer,Football,Lemming,Woman,Bolt,David1,David2,Singer1,and Basketball are 0.69,0.58,0.71,0.85,0.58,0.78,0.75,0.60,0.74,and 0.69,respectively.The average running speeds (frame/s) in video sequences of Girl,Deer,Football,Lemming,Woman,Bolt,David1,David2,Singer1,and Basketball are 8.14,7.32,7.78,6.69,6.31,7.57,6.73,7.17,5.97,and 6.38,respectively.Compared with that of similar methods (e.g.,L1APG,TLD,and LOT),the average tracking overlap rate of the proposed method is higher by approximately 20%. Experimental results indicate that the use of the color and texture features of the target in conducting bidirectional optimization similarity matching of the foreground region ensures accurate tracking and strong adaptability of the algorithm under the conditions of partial occlusion,deformation,and complex background.The characteristics of BOTFP are as follows:1) A perfect appearance model is obtained when using the color and texture features of the target in conducting bidirectional optimization similarity matching of the foreground region.2) The foreground information of the image frames is estimated and evaluated,and the matching process is restricted to avoid the interference of background information.3) The result is robust when using the bidirectional optimization similarity matching method.4) In this study,an online model updating algorithm is proposed,which can be used to determine the accuracy of the model.
摘要:The traditional shape from shading (SFS) algorithms inaccurately estimate the initial information of the original direction of the light source,thereby causing the reconstructed surface to be smooth in several objects with a rough surface.The ideal reflection model,i.e.,Lambertian surface,is too simple to meet the conditions in real applications.Therefore,the reconstructed shape of the object presents serious errors.In this study,a reflection model based on the radial basis function (RBF) neural network is established to solve this problem.Moreover,the fixed learning rate in the traditional algorithm slows down the training process.Thus,the neural network is improved in the aspect of learning rate to accelerate the training speed and avoid being trapped in a local minimum at the same time. The reflection model,which is based on the RBF,replaces the ideal Lambertian reflection model used in the traditional SFS algorithms.The excellent local mapping and function approximation capabilities of the RBF are suitable for dealing with the classic SFS problems.The original information of the light source is replaced by the weight in the training process of the RBF neural network.Then,the SFS problem is transformed to obtain the optimal solution for its energy equations.In this manner,the limitation that the direction of the light source parameters must be known in the beginning is removed.Only one single image is needed in the improved algorithm to restore the 3D surface of the target object.However,the fixed learning rate in the traditional neural network easily causes the local minimum when the parameters are inappropriate in the beginning and slows down the training speed in practice.Therefore,the adaptive learning rate algorithm is added to the network to accelerate the convergence and training speeds.The learning rate can be automatically adjusted from the iterations. In the experiments,two synthetic images for the classic SFS problems are Vase and Mozart and two real images are Map and Antique.The four images are used to test the improved algorithms.Then,the 3D visual effects and the quantitative comparison of the results are conducted.The experimental results show that the results of the improved algorithm in 3D visual effects are better than those of the traditional SFS algorithms.The reconstruction of the 3D shape information is apparently superior to that of the traditional algorithm.The traditional SFS algorithms always show serious errors in real images.The traditional algorithms show some errors in details in synthetic images.The improved algorithm using RBF performs well when dealing with the synthetic images and yields improved results when dealing with the real images.The min-max normalization function is introduced to deal with the height information to compare the absolute errors clearly.According to the comparison of the normalized 128 pixels height error from the map,the result of the normalized 3D height error using the improved algorithm decreases by more than 60% compared with the traditional SFS algorithms.The entire calculation process of the improved algorithm takes 50% less time than that of the traditional algorithms.Moreover,the improved algorithm is appropriate for synthetic and real images simultaneously. The ideal Lambertian reflection model cannot easily meet the needs in practice because the assumption that the Lambertian surface reflects light in all directions is inaccurate in real application.Therefore,the results of the objects with rough surface using the traditional SFS algorithm exhibit many errors.The Lambertian reflection model is only suitable for some synthetic images with a smooth surface.However,for the real images,the reconstructed 3D surface cannot be precise when using this model as it causes serious errors in the shape of the restored images and the 3D visual effect is poor.With some real images,such as Map,the basic shape of mountain can be barely distinguished.The RBF is introduced to the SFS problems to establish a new reflection model to solve this problem.The reflection model based on the RBF replaces the original direction of the light source by the weight in the neural network.By training the relevant parameters in the network,the improved algorithm overcomes the shortcomings of traditional methods,e.g.,the inaccurate estimation of the original light source.The new reflection model is suitable for synthetic and real images and shows excellent 3D recovery capability in real images with rough surface.Moreover,the adaptive learning rate algorithm accelerates the training speed,thereby obviously improving the work efficiency.The training time of the new reflection model is 50% less than that of the traditional algorithm.The maximum absolute error value of the new reflection model is reduced by more than 60% compared with the traditional SFS algorithm.In this study,the improved algorithm is applicable to the 3D reconstruction of surfaces in most non-high-precision situations,without the need for the initial information of the direction of the light source.
关键词:shape from shading;radial basis function;neutral network;reflection model;3D-reconstruction
摘要:Information of the single-modality medical image is limited;thus,it cannot reflect all the details of relevant organizations and may cause misdiagnosis in the clinical setting.A scientific and effective fusion algorithm is proposed to fuse multimodal medical images,enrich fusion image information,and improve image quality to provide the basis for clinical diagnosis and solve previously mentioned problem. A medical image fusion algorithm based on non-subsampled shearlet transform (NSST) is proposed.First,low-and high-frequency sub-bands are obtained using NSST.Then,on the basis of the low-frequency sub-band image feature,a fusion rule based on low-frequency coefficients combined with pulse-coupled neural network is adopted for low-frequency sub-band images.On the basis of the different structural similarities (SSIM) of high-frequency sub-band images,the fusion rule of combined visual sensitivity coefficient (VSC) with improved gradient energy is adopted for low SSIM sub-bands,whereas the fusion rule of combined VSC with regional energy is applied for high SSIM sub-bands.Furthermore,a closed-loop feedback is introduced into the fusion rule to optimize variables adaptively using the sum of the SSIM and edge-based similarity measure() as objective evaluation.The image is restructured by inverse NSST. Experiments are conducted on gray and colored images and compared with four other types of fusion methods in terms of subjective visual effect and objective evaluation criteria.This method exhibits a good fusion effect.The factors and evaluation criteria of edge difference are the best,whereas other indicators are better.Compared with the multi-modality medical image fusion method based on non-subsampled contourlet transform by Jin zhenyi,five groups of spatial frequencies were increased by 11.8%,24.7%,83.4%,11.9%,and 30.3%;edge-based similarity measures were increased by 6.7%,15%,40%,50%,and 12%;SSIM were increased by 0.7%,7.3%,2,4%,-3.6%,and 2.1%;and cross-entropy measures were decreased by 16.9%,1.6%,-27.4%,6.1%,and 0.4%. The proposed algorithm can effectively improve the quality of multimodal medical image fusion and increase the complementary information among different modalities.This algorithm is superior to existing medical image fusion algorithms.The fused image has more grand character and equally abundant and more in accord with human vision character.
摘要:The density of road network is an important indicator of the accessibility of regional traffic,and obtaining the road pixels in an image is the first step to calculate the density of road network.This study assumes that the spectral characteristic of urban trunk road in high-resolution remote sensing image is consistent along the road direction. A 2D Gabor filter is designed on the basis of the analysis on the angular texture diagram of pixel points and the direction of gray minimum variance.The filtered value group is used as a representation of each pixel.The road pixels are extracted by k-mean clustering segmentation strategy and are refined into the trunk road network. Completion rate,accuracy,and extraction quality are regarded as evaluation indicators of precision and k-mean clustering is used as segmentation strategy.When the segmentation objects are gray value,gray-level co-occurrence matrix,multichannel 2D Gabor filtered value,and the eigenvector adopted in this study,the evaluation results of the completion rate,accuracy,and extraction quality are 0.45,0.51,and 0.37;0.62,0.70,and 0.54;0.58,0.66,and 0.52;and 0.72,0.78,and 0.65,respectively.When the Hough transformation method is used as extraction strategy,the evaluation results of the completion rate,accuracy,and extraction quality are 0.41,0.56,and 0.34 respectively.When the multiscale segmentation method is used as extraction strategy,the evaluation results of the completion rate,accuracy,and extraction quality are 0.41,0.56,and 0.34,respectively.Therefore,in the case of the same segmentation strategy,the segmentation objects adopted in this study can exhibit higher classification accuracy than that of the other objects.Compared with traditional classification methods based on linear or planar characteristics,the method adopted in this study exhibits a certain advantage in terms of accuracy. In this study,a new extraction method of road pixel is proposed,in which the image is segmented by creating a 2D Gabor filtered value group with a specific direction.Experimental results show that this method presents high anti-noise performance and universality and can be applied to the high-resolution remote sensing images from GF-1,GF-2,IKONOS,QuickBird,and other sensors.
摘要:With the wealth of available spatial and spectral information,hyperspectral remote sensing images have been used for many remote sensing applications and have attracted considerable attention.However,most hyperspectral remote sensing images suffer from degradation because of the distortion of atmospheric transmission,the limitation of electronic devices,and the influence of poor illumination.The degraded data can lead to seriously inaccurate results in the subsequent applications.Thus,on the basis of the low-rank representation and the mixture total variation regularization in this study,a new model is proposed to restore hyperspectral remote sensing images. First,two types of low rank-based priori information in the hyperspectral remote sensing images,i.e.,the low rank-based priori information in the spectral domain and the low rank-based priori information in the spatial domain,are explored.Then,on the basis of the low rank-based priori information in the spectral domain,a low-rank representation model is proposed to suppress sparse noises,such as impulse,stripe,and dead line noises.Subsequently,on the basis of the low rank-based priori information in the spatial domain,the mixture total variation regularization method is proposed to suppress density noises,such as Gaussian and Poisson noises.The mixture total variation is the linear combination of the anisotropic and isotropic total variations and is more approximate to the zero norm than the anisotropic and isotropic total variations.The restoration results of the mixture total variation regularization method are better than those of the traditional total variation-based methods.Finally,the low-rank representation and mixture total variation regularization models are integrated.As such,the new restoration model based on the low-rank representation and mixture total variation regularization possesses the advantages of the two aforementioned models. This study tests the performance of the proposed method with a set of challenging hyperspectral remote sensing images.The simulated noises are Gaussian,Poisson,salt-and-pepper,and dead line noises.The intensities of the mixture noise in each band are different,thereby enabling the simulation of the real situations as approximately as possible.The restoration results of the proposed method are compared with those of several related methods.After restoring the image,the peak signal-to-noise ratio and structural similarity indices are adopted to provide quantitative assessments of the results of the experiments.All the experiments prove that the proposed method achieves better visual quality and quantitative indices than those of several existing related methods. The proposed model relies on the low rank-based priori knowledge in the spectral and spatial domains,efficiently suppresses the sparse and density noises in the degraded hyperspectral remote sensing images,and finally restores better hyperspectral remote sensing images than those of existing methods.The proposed model can be extended to other fields of remote sensing applications.
摘要:An unmanned aerial vehicle (UAV) exhibits many advantages in remote sensing because of its high efficiency,high resolution,low cost,and simple operation.However,most UAVs carry only visible-light true-color sensors,which only contain red,green,and blue bands.Generating some of the most commonly used visible-near infrared-based vegetation indices,such as normalized difference vegetation index (NDVI) and soil-adjusted vegetation index,is difficult.Although hyperspectral and multispectral sensors can produce the previously mentioned indices,the high cost and complexity of data acquisition hinder the further development of UAV technology in the field of vegetation remote sensing.A new vegetation index,which can fully utilize visible-light true-color image in the HSL color space,called Normalized Hue and Lightness Vegetation Index (NHLVI),has been proposed to solve this problem. The characteristics of hue and lightness of different objects in the HSL color space model were analyzed within visible-light true-color image,and the hue (H) and lightness (L) components were selected because of their weak correlations.After their normalization,NHLVI was constructed on the basis of the form of NDVI to enhance the vegetation information.A total of 88 visible true-color and 163 multispectral images,which covered a test area with different vegetation types and coverage,were acquired through a UAV flight campaign to verify the validity of this new vegetation index.The structure from motion algorithm was used to mosaic the UAV images,and the digital orthophoto map of the test area was produced.Then,the NHLVI and several commonly used visible vegetation indices,i.e.,normalized green-red difference index (NGRDI),excess green,vegetation index,color index of vegetation,excess green minus excess red,combination,and combination 2,were calculated.Hyperspectral datasets were simultaneously collected using the ASD HandHeld2 during the UAV flight and were resampled to match visible true-color and multispectral sensors in accordance with their spectral response function.For the multispectral image,the digital number value was converted to reflectance through the empirical line method.All the vegetation indices calculated from visible true-color UAV imagery were compared with NDVIs generated from ASD and UAV multispectral data.The receiver operating characteristic (ROC) curve was employed to analyze and determine the threshold to extract and compare vegetation information further.The results from different vegetation indices were compared. The correlation coefficient () was used to evaluate the performance of all the vegetation indices from visible-light true-color imagery.The correlation coefficient(s) between different visible true-color-based vegetation indices and ASD-based and/or multispectral UAV-based NDVIs show that a high correlation exists between the proposed vegetation index and NDVI,with a correlation coefficient () of 0.776 8,followed by the correlation between NGRDI and NDVI (=0.687 4).The ROC curve was utilized to explore the capability of the proposed vegetation index to extract vegetation information.NGRDI and NDVI were selected and compared with NHLVI.First,the area under the ROC curve was calculated for NHLVI (0.777),and it is smaller than that for NDVI (0.815) but larger than that for NGRDI (0.681).This finding indicates that NHLVI outperforms NGRDI in terms of vegetation extraction.Second,vegetation information was extracted by the predefined threshold and compared between different vegetation indices through visual interpretation.Given the difficulties in threshold determination,the ROC curve method was selected to decide the threshold.By selecting the point on the cutoff between the curve and a line with slope equals to 1,the sensitivity and specificity were maximized whereas the omission and commission errors were minimized.Finally,quantitative analysis was performed,400 random points were positioned on the UAV visible-light true-color image to derive the NHLVI,and the NDVI was extracted from the multispectral UAV image as a reference to evaluate the vegetation extraction accuracy.The overall vegetation extraction accuracy using NHLVI is 82.25%,which is higher than that of NGRDI (79.75%) and lower than that of NDVI (84.00%).The commission error of NHLVI is 6.64%,and the omission error is 22.82%.Notably,the proposed vegetation index,compared with the other vegetation indices,e.g.,NGRDI,performs outstandingly in sparsely vegetated areas. The NHLVI was proposed in this study.This index takes advantage of the HSL color space transformation and makes good use of high spatial resolution visible-light true-color images,e.g.,UAV images.Experimental results show that the proposed vegetation index,compared with several commonly used vegetation indices,possesses a stronger correlation with NDVI and exhibits the same value range,i.e.,[-1,1].Vegetation cover information can be extracted using the threshold set by the ROC method.The proposed method can achieve outstanding results compared with other vegetation indices,particularly in sparsely vegetated areas,and presents nearly the same accuracy as NDVI.Overall,NHLVI is suitable for vegetation extraction from visible-light true-color images from UAVs and provides a new method for the application of UAV-based remote sensing.Future works will focus on its application to different types of vegetation and verification in different sensors.
摘要:Ridge lines and valley lines are the most important terrain feature lines,which delineate skeleton structures of terrain.The automatic extraction of them from digital elevation models (DEM) is of great significance in applications such as relief automated generalization,hydrological analysis and geographical information system,and can also provide matching features for terrain model simplification and sample-based terrain synthesis.For most traditional algorithms,feature lines can only be filtered based on their length by eliminating endmost short branches,and it is difficult to accurately control the saliency of extracted feature lines.Because feature lines are usually modeled as trees,and break-all-loop algorithms such as polygon breaking and minimum spanning tree are commonly used,ringlike features such as craters cannot be correctly extracted without notches.Therefore,a new algorithm for saliency-controllable topographic feature line extraction from DEM was proposed based on feature saliency. Our algorithm consisted of five steps.First,feature points were extracted by the global profile scan algorithm which can better eliminate noise and pseudo feature points,and feature saliency of each feature point was calculated according to its height,ambilateral height drops and slopes.Second,feature connectivity was enhanced by feature extension based on feature directions of feature points.Third,feature point set was thinned to be one-pixel-wide with an improved Hilditch algorithm,in which a neighborhood scan process was added when finding a point that can be deleted,and among all points satisfying remove conditions in the neighborhood,the one with the smallest feature saliency was chosen to be deleted.In such a way,thinning result had almost nothing to do with the global scanning sequence,and salient feature points could be preserved.Fourth,feature graph was constructed by taking feature points as nodes and adding edges to all neighboring ones.It was highly likely that many loops existed in the graph,and most of them were redundant small ones.In order to support extraction of ringlike features,we broke small loops while preserving big ones.A loop detecting and breaking algorithm was devised to handle all kinds of loops,such as nested loops,loops containing common sides and loops with no interior points.In the end,feature graph was decomposed into branches and feature lines were constructed by assembling consecutive branches,requiring that all branches inside a feature line had similar feature saliencies and directions of neighboring branches were consistent.Then,feature saliency of each feature line was calculated,based on which all feature lines were sorted in descending order.And by selecting a certain number of headmost feature lines in the sorted list,it was simple to implement feature saliency-based filtering. To validate the effectiveness of the proposed algorithm,a comparison with the existing feature saliency-based topographic feature line extraction algorithm was made by extracting some most significant feature lines from real DEM data.The proposed algorithm can extract feature line trunks more accurately and can effectively avoid overextension of trunks,which is attributed to the new feature graph decomposition strategy.In terms of saliency-based control,by simultaneously considering feature line length,average and maximum feature saliency of all feature points on the line,the new formula of feature saliency calculation for feature lines is more reasonable,and so the proposed algorithm can provide more accurate control over the saliency of extracted feature lines.To verify the proposed loop detecting and breaking algorithm,intermediate results before and after loop breaking were shown,and a synthesized DEM with a big crater was tested.Results show that the proposed algorithm can well preserve the large ridge line loop while breaking all small redundant ones.Because differences in thinning results for real DEM data are at pixel level and can hardly be perceived by eye,so a comparison was made based on a man-made example of 2-pixel-wide feature point band to exemplify the proposed improved thinning algorithm.And the improved algorithm deletes feature points with smaller feature saliency which means true features are more likely to be preserved. The proposed algorithm can effectively extract ridge and valley lines with controllable saliency,which are consistent with human eye observation,and can support terrains with ringlike features.The main contributions include three points.First,by exploiting the proposed feature graph decomposition algorithm and applying the devised formula to calculate feature saliency of feature lines,decomposition results become more reasonable,and more precise control over the feature saliency of extracted feature lines can be provided.Second,a novel algorithm was proposed to detect all redundant small loops while preserving large ones,and it can handle all complex loop cases.At last,a further improvement upon the improved Hilditch algorithm was made based on feature saliency,so that more salient feature points will be retained and the resulting skeleton lines may be more accurate for feature point bands of 2 or 3 pixel wide.However,there are several shortcomings of the proposed algorithm.First,there are many parameters which are set mainly based on experience.Especially,the branch feature saliency dissimilarity threshold used in feature graph decomposition should be carefully tuned,which has a great influence on experimental results.Second,compared with some break-all-loop algorithms such as minimum spanning tree,the loop detecting and breaking algorithm is inefficient and it becomes a bottleneck of the whole algorithm.Future work includes improving the efficiency of the loop detecting and breaking algorithm as well as parameter optimization and automatic parameter setting.
关键词:topographic feature line extraction;feature saliency;digital elevation model;feature graph decomposition;feature line filtering