最新刊期

    24 7 2019

      Scholar View

    • Lu Meng, Chengxin Li
      Vol. 24, Issue 7, Pages: 1011-1016(2019) DOI: 10.11834/jig.190111
      摘要:ObjectiveTarget tracking is an important part of computer vision. In recent years, target tracking algorithms based on correlation filtering and deep learning have been emerging in endlessly. This paper will elaborate on and analyze some classical target tracking algorithms.MethodFirst, this paper introduces the basic theory of the tracking algorithms based on correlation filtering. And it will also give a summary in terms of the feature improvement, scale improvement, elimination boundary effect, image segmentation, and target response adaptive classes of the correlation filtering algorithms are summarized. Next, the target tracking algorithms based on deep learning are expounded and analyzed from three aspects: target classification, structured regression, and Siamese neural network. An in-depth interpretation of the advantages and defects of representative tracking algorithms is also provided.ResultThe advantages and disadvantages of each phase algorithm are summarized through an enumeration of the enhanced tracking algorithms for different improved mechanisms in the correlation filtering and deep learning phases. This paper expounds on the latest progress of target tracking algorithms and summarizes their future development direction.ConclusionThe target algorithm based on correlation filtering performs well in real-time performance but still requires optimization for complex background interference and similar object occlusion. The deep convolution feature has a strong representation of the target, and when the correlation filtering algorithm is combined with deep learning, the performance of the algorithm is greatly improved. Tracking algorithms based on deep learning objectives are highly focused on tracking performance, and most of them cannot achieve real-time performance. The use of the Siamese neural network has greatly promoted the deep tracking-based target tracking algorithm by taking into account the performance and real-time performance of the algorithm.  
      关键词:object tracking;correlation filtering;deep learning;siamese neural network   
      76
      |
      420
      |
      5
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685181 false
      更新时间:2024-05-07

      Review

    • Active geometric reconstruction methods for objects: a survey

      Yanzi Kong, Feng Zhu, Yingming Hao, Qingxiao Wu, Rongrong Lu
      Vol. 24, Issue 7, Pages: 1017-1027(2019) DOI: 10.11834/jig.180607
      Active geometric reconstruction methods for objects: a survey
      摘要:ObjectiveTarget modeling is one of the main research directions in the field of machine vision, and this technology is widely used in various fields. When modeling the geometry of an object, the data obtained from one viewpoint are often incomplete, and large-area losses may even occur. Therefore, obtaining the information of the target from different viewpoints and fusing the information are necessary to achieve a complete geometric modeling of the target. Active object reconstruction is an intelligent perception method that achieves target modeling with few viewpoints and short motion paths by systematically adjusting the pose parameters of the camera while ensuring model integrity. To reflect the research status and latest development of active object reconstruction, relevant studies since 2004 are combed and analyzed, and a summary of domestic and foreign research methods is made.MethodAt present, active object reconstruction is mainly aimed at two task types: model-based and non-model active object reconstruction. Model-based methods pre-plan a series of viewpoints before modeling and can achieve full coverage of the target with high quality. Non-model methods have no information on the target at all, and view planning is performed in real time during modeling. In practical applications, the second category appears frequently and is difficult; thus, this study only summarizes non-model methods. On the basis of the rebuilt model type and the information used during view planning, non-model active object reconstruction methods are divided into three categories, namely, surface based, search based, and combined. The basic ideas of each type of method are explained, and the problems involved are summarized. Surface-based methods use point cloud and triangular patch models. They extract shape information from the obtained local model and classify the shape of the unknown region to determine the next viewpoint. Search-based methods use voxel models. A certain method is employed to determine the candidate viewpoints, and then these viewpoints are scored by a reasonable evaluation function. The candidate viewpoint with the highest score is used as the next best view. The combined method uses the surface and voxel models and merges the advantages of the two methods comprehensively to provide effective information for view planning. However, combined methods have not been investigated much recently, and the first two methods have mainly been the focus. Surface-based methods involve problems of detection direction determination, unknown surface prediction, and next-best viewpoint determination. Search-based methods involve problems of model type selection, search space determination, undetected area prediction, and design of the evaluation function to sort candidate viewpoints. The main research methods for these related problems are summarized and analyzed, and the solutions to each problem are combined reasonably to form different active object reconstruction methods.ResultIn surface-based active object reconstruction methods, the manner of determining the direction of detection and predicting the unknown area has an important impact on the view planning effect. When selecting an edge point to determine the direction of detection, the use of the quantitative indicator method is more reliable than the use of the spatial position method to express the unknown region, but its computational complexity is higher. In addition, using an indirect method to predict an unknown surface may be simpler than using a direct method, but it results in larger fitting errors. In general, surface-based methods are relatively simple, and the process of each view planning consumes minimal time. However, the unknown region depends on its adjacent surface trend to predict; thus, this method is only suitable for reconstructing objects with regular shapes. Search-based active object reconstruction methods quantitatively evaluate each candidate view. The octomap model is more efficient than other probabilistic voxel models when selecting model types. The selection of candidate viewpoints using dynamic search space methods has higher computational complexity than using fixed search space methods, but such methods have no limitation on the target size, and their application scenario is extensive. When predicting the information contained in an unknown voxel, its relative positional relationship with the known voxel can be utilized; thus, using this method for the next view planning can maximize the known information compared with not updating the unknown voxel. When determining the evaluation function, information gain modules may be added to the evaluation function, and the adjacent frame overlap ratio optimization modules, the neighboring viewpoint distance optimization modules, and the reconstructed surface quality optimization items may be added as needed. The information gain of the viewpoint is obtained by counting the voxel gain in the field of view. Differences in voxel gain calculation and statistical methods directly affect the information gain value of the viewpoint. With these search-based methods, the next view planning works well, but the process is time consuming. Moreover, the problems involved in such methods have a larger solution space than those involved in surface-based methods. Therefore, more research results are generated in search-based active object reconstruction methods. However, such methods are relatively computationally intensive, and in most cases, the views are not continuously pulsating in the search space, and point cloud registration is not considered.ConclusionResearchers who study active object reconstruction have made some progress at present, but the accuracy and efficiency of active reconstruction can still be improved. Other feasible research directions are provided in the end, and these could serve as a reference for future research in this direction, such as introducing a priori information into the process of view planning, combining surface- and search-based methods, and building perceptual intelligence systems that are suitable for different tasks.  
      关键词:active object reconstruction;active vision;view planning;sensor planning;intelligent perception   
      66
      |
      39
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685193 false
      更新时间:2024-05-07
    • Research progress on copyright protection technology of 3D printing model

      Xiaoqing Feng, Li Li, Jilin Wang, Keming Dong, Yanan Liu, Surong Yan
      Vol. 24, Issue 7, Pages: 1028-1041(2019) DOI: 10.11834/jig.180512
      Research progress on copyright protection technology of 3D printing model
      摘要:ObjectiveWith the rapid evolution of 3D printing and scanning technologies, products on the traditional manufacturing line can be easily and rapidly copied. Copyright issues have become an unavoidable and increasingly prominent problem in the 3D printing era. Users can rapidly and easily copy and produce 3D models through 3D printing-scanning technologies. The copyright protection problems become increasingly serious. The manner by which to protect the copyright of 3D printing models effectively obtained great attention from numerous researchers in recent years. Copyright protection methods for 3D printing models have been proposed. However, the effectiveness and detection rate of the algorithms must still be further improved. This work focuses on the main literature published worldwide for comprehensive analysis to reflect the research status and latest development of the copyright protection of 3D printing models fully.MethodThe 3D printing model file is represented in detail in accordance with the basic framework of copyright protection of 3D printing model on the basis of extensive literature research. The attack type of the 3D printing model is elaborated. Furthermore, the influence of 3D printing and scanning process on the model is analyzed. Then, the related technologies are classified in accordance with the technical characteristics of the copyright protection strategy of 3D printing model. The basic framework of each type of method and related technical features are described. Finally, the performance analysis of 3D printing model digital watermarking is conducted in accordance with related references. Such analysis is mainly discussed from two aspects, mainly from the performance comparison of traditional grid watermarking algorithm and that of 3D printing-scanning attack.ResultThe 3D product model exhibits digital and solid models. The 3D printing refers to the process of converting a digital model to a physical one. A physical model is transformed to a digital scanning model during scanning. These processes involve multiple uneven sampling and quantization and are highly complicated processes. The existing 3D model digital watermarking algorithm mainly focuses on the copyright protection of a single digital modal model and ignores the copyright protection of the product model under the physical mode. The 3D printing characteristics (i.e., print resolution, layer thickness and smoothness, and step layering effect) and geometric distortion caused by 3D scanning are neglected. Therefore, the traditional 3D mesh model watermarking strategy cannot be directly applied to the 3D printing model. The related technology is divided into two categories, namely, physical characteristic method and digital watermark, in accordance with the technical characteristics of the 3D printing model copyright protection strategy. The copyright protection technology based on the physical characteristics can embed effective copyrighted logos for 3D printed models. However, this type of technology requires special equipment when extracting micro-structure information and exhibits no universality. Methods based on the digital watermarking can effectively resist not only the attacks of similarity transformation, cutting, noise, subdivision, quantization, and smoothing in the traditional digital domain but also the modal transition attack of the 3D model. The high-precision of 3D printers and scanners can effectively improve the detection rate of watermarks.Conclusion3D printing products are the crystallization of wisdom and hard work of manufacturers and designers and contain huge intellectual property rights. With the extensive penetration of 3D printing into the industrial field, the copyright protection of 3D printing models possesses potential application and research value. However, limitations are observed in the detection and evaluation mechanism of the copyright protection of the current 3D printing model. Therefore, further research is needed. A unified 3D printing model test library and 3D printing model watermark evaluation system should be developed.  
      关键词:3D printing;3D scanning;copyright protection;digital watermarking;evaluation criteria   
      59
      |
      174
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685192 false
      更新时间:2024-05-07

      Image Processing and Coding

    • Haiwen Yu, Xinwei Yi, Shaoping Xu, Guizhen Zhang, Tingyun Liu
      Vol. 24, Issue 7, Pages: 1042-1054(2019) DOI: 10.11834/jig.180534
      Two-stage multi-layer perceptron estimation for random-valued impulse noise ratio
      摘要:ObjectiveThe existing switching random-valued impulse noise (RVIN) removal algorithms mainly detect the noisy pixels of an image to be denoised by comparing the local image statistic with predefined thresholds. Then, denoising methods are combined to restore the detected noisy pixels in a pixel-wise manner, resulting in low execution efficiency. With regard to computational complexity, the convolutional neural network (CNN)-based denoising algorithms that were implemented at patch-level for RVIN exhibits a significant advantage over classical switching denoising algorithms that detect and remove RVIN pixel by pixel. However, the restoration performance of the CNN-based denoising algorithms remains limited to the accurate estimation of the distortion level of the given noisy image. In essence, the CNN-based denoising algorithm is still a non-blind method, wherein the optimal denoising effect can be only obtained by training a specific denoising model at a fixed noise level, thereby limiting practical application. For simplicity, the noise ratio can be treated as a measure of the distortion level of a noisy image by dividing the number of detected noisy pixels by the total number of image pixels. The CNN-based denoising methods can blindly and efficiently remove the RVIN with high quality by adaptively using the corresponding pre-trained denoisers in accordance with the estimated noise ratio. A two-stage noise ratio estimation algorithm based on multi-layer perceptron (MLP) was proposed in this paper to estimate the noise ratio precisely.MethodSubstantial clean images were first corrupted with RVIN at different ratios to form a set of noisy images. Then, the features that can reflect the distortion level of a noisy image were extracted and screened to form feature vector for each noisy image on the basis of the visual codebook and soft-assignment coding technology. Subsequently, the feature vectors and their corresponding noise ratios extracted from noisy images were used as the input and output of the multi-layer perceptron model, respectively, to train the noise ratio estimation model that maps a given feature vector to its corresponding noise ratio. Generally, numerous hidden layers are required in MLP architecture to obtain the ideal approximation function. However, the development of an MLP-based regression model with multi-hidden layers is difficult in convergence and training speed. Therefore, a coarse-to-fine two-stage strategy was used to improve the estimation accuracy further. Specifically, a relatively coarse noise ratio estimation model was trained across the entire range of noise ratio. Then, the noise ratio range was divided into several sub-ranges, indicating that the mapping range of the estimation model is diminished. Similarly, several fine noise ratio estimation models were trained in different noise ratio sub-ranges. Each subinterval overlaps with its adjacent subinterval to avoid the estimation inaccuracy at the subinterval extremities. In the prediction phase, a preliminary estimation is first obtained using the coarse estimation model. On this basis, the corresponding fine estimation model is used to predict the noise ratio further accurately.ResultComparison experiments were conducted to test the validity of the proposed method from three aspects, namely, estimation accuracy, denoising effect, and execution efficiency. The proposed method was initially compared with several classical noise detectors of RVIN denoising methods, such as PSMF, ROLD-EPR, ASVM, and ROR-NLM, to demonstrate the estimation accuracy. The number of detected noisy pixels was converted into noise ratio because the output result of the noise detectors of those compared switching denoising methods is the number of noisy pixels. Results show that the estimation error of the proposed method is less than 2% across different noise ratios, thereby showing stronger robustness than others. The feed-forward denoising convolutional neural network (DnCNN) algorithm designed for removing Gaussian noise was improved to manage RVIN removal for verifying the availability of the proposed method. In denoising effect comparison, the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM) were adopted as image quality assessment indexes. For the distorted images with different RVIN noise ratios, the PSNR values obtained by the improved DnCNN algorithm utilizing the proposed method increase by 2 dB more than that of others across the noise ratio range from 10% to 60%. Moreover, the FSIM values rank second for different noise ratios, whereas the SSIM values approximate the optimal results. With regard to qualitative visual evaluation, the improved DnCNN algorithm utilizing the proposed estimation model can generate a clear restored image with enhanced edge preservation. The improved DnCNN algorithm outperforms the switching RVIN removal methods in terms of execution efficiency, which takes only 3.8 s to restore an image of size 512×512 pixels.ConclusionExtensive experiments show that the estimation accuracy of the proposed MLP-based noise ratio estimation algorithm is robust across a wide range of noise ratios. With the proposed noise estimation model, the CNN-based RVIN removal algorithms can achieve optimal blind denoising by exploiting the closest matching model. Moreover, the improved DnCNN denoising algorithm with the noise ratio estimation module significantly outperforms the traditional switching RVIN denoising algorithm in terms of denoising effect and execution efficiency, thereby rendering it highly practical.  
      关键词:image denoising;random-valued impulse noise (RVIN);noise ratio estimation;noise ratio-aware feature;multi-layer perceptron network;computational efficiency   
      17
      |
      39
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685191 false
      更新时间:2024-05-07
    • Sparsity image inpainting algorithm based on similar patch group

      Xingxin Lou, Xianghong Tang, Yue Zhang
      Vol. 24, Issue 7, Pages: 1055-1066(2019) DOI: 10.11834/jig.180567
      Sparsity image inpainting algorithm based on similar patch group
      摘要:ObjectiveIn the process of traditional patch-group-based image inpainting, synthetizing the incorrect filling patch is easy, thus causing incoherence at the edge because of incorrect matching patches. However, a traditional exemplar-based image inpainting algorithm only uses a single patch to fill a damaged area, making it prone to generate strong artifacts. It could give rise to strong inpainting trace, which affects a person's visual sense, because of its uncertain consistency with the surrounding structure. To solve these problems, an improved sparsity image inpainting algorithm based on a similar patch group is proposed in this study.MethodFirst, searching for similar patch groups only uses color differences, thus making it easy to match incorrect patches whose structural trend is obviously different from the target patch. The cosine distance is an effective tool for measuring the change in direction between vectors, and it can measure the proximity in the vector direction. Thus, the patch matching criterion is defined by the combination of color information and cosine distance in this study. In this manner, matching patches whose structural trend is similar to the target patch can be obtained, and the error matching rate can be reduced when matching similar patches. Second, in the process of sparse reconstruction, to enhance the capability to filter matched patches and decrease texture blur, we estimate the unknown pixels according to the similar patch group obtained in the first step. Considering that different matching patches have different filling effects on the filling area and to make the whole sparse representation model capable of filtering matching patches, we utilize known and estimated unknown information to calculate the matching degree between every similar patch and the target patch to add different weights on the sparse coefficients. Finally, repairing the detailed structure of the area with a fixed patch size is difficult because an image always has rich and varied information. Structural sparsity is used to reflect structural complexity; thus, the patch size is adaptively decided by structural sparsity in this study. Different from the method that uses a fixed threshold to determine the patch size, the proposed method iteratively calculates structural sparsity and decreases the patch size until the structural complexity of the patch is reduced to the average value, which helps reduce error propagation during image inpainting.ResultThis study shows several representative color images that are damaged artificially into different shapes for inpainting in Fig. 5—8. In addition, it compares the repair results for the same image damaged with different kinds of shapes and provides a detailed view of the result in Fig. 9. Experimental results show that the proposed algorithm is more effective in subjective vision than other related algorithms and improves the phenomenon of fracture or texture extension during the process of edge structure repair. Other experimental data are given in Tables 1-2. The results show a significant improvement in subjective vision, and the peak signal-to-noise ratio of the result is improved to approximately 0.53 dB. However, the running time is increased because the method needs to calculate the patch structure sparsity iteratively and adjust the adaptive patch size dynamically.ConclusionThis study proposes a weighted sparsity image inpainting method based on a similar patch group. The method combines the features of color information and cosine distance so that trends similar to the target patch can be obtained while searching for similar patch groups. Thus, the method is precise during patch propagation. Moreover, different sparse weights are added according to the degree of similarity; thus, the structural information is preserved as much as possible during the entire process of inpainting, and texture blur is reduced. The combination of the two strategies can improve the efficiency of the image inpainting algorithm. The proposed inpainting algorithm achieves the inpainting of damaged color images with different structural features and shapes. We perform a comparison to prove that the proposed method is more efficient than existing color image inpainting methods. Results show that the improved method in this study has an ideal repair effect on the strong edge structure and multi-structure area, thus effectively improving the image restoration quality. Nevertheless, this method only utilizes the information within a certain range of the image. Therefore, several shortcomings, such as the unsatisfactory repairing result in the case of inconsistent edge structure and the inability to maintain edge curvature, still exist. Thus, our method is unsuitable for all kinds of images. Future research will focus on the recovery of the edge curve and irregular texture breakage.  
      关键词:color image inpainting;matching criterion;similar patch group;sparse constraint;adaptive patch size   
      66
      |
      229
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685190 false
      更新时间:2024-05-07

      Image Analysis and Recognition

    • Wenxia Bao, Dongwen Xie, Ming Zhu, Dong Liang
      Vol. 24, Issue 7, Pages: 1067-1075(2019) DOI: 10.11834/jig.180504
      Gesture recognition based on aggregate channel feature and dual-tree complex wavelet transform
      摘要:ObjectiveWith the continuous development of today's society, people's yearning for an improved life and the level of demand for material life are constantly improving. People are bringing a highly convenient lifestyle with this improved technological development. Human-computer interaction plays an increasingly important role in people and computer's life and becomes a powerful tool for people to work, live, or play. Traditional human-computer interaction devices, such as keyboards, mouses, and touch screens, restrict people's use and limit their imagination because they can accurately operate. Therefore, the research direction for studying gesture recognition on the basis of images or video streams is important. Gestures are more natural and flexible than traditional I/O devices, thereby rendering gesture recognition technology a major research topic. Numerous methods are used to process input images or videos through several techniques, such as machine learning and image processing, for achieving real-time gesture interaction. This method is a research development in the computer vision field. The categories corresponding to the gestures are analyzed by detecting the hand feature information of these objects in the extracted image or video stream, thereby providing corresponding technical support for these fields. In some cases, the human body background in the scene is complex and diverse. The image light, distance, and angle of the hand introduced into the camera are diverse due to human arbitrariness. Thus, the study of gesture recognition in complex environment has become highly important. The current gesture recognition method is affected by the environment, light, rotation, zoom, and skin color, resulting in low accuracy and speed of gesture recognition. Thus, a gesture detection and dual-tree complex wavelet transform (DTCWT) combined with aggregate channel feature (ACF) is proposed to solve such problem. A gesture recognition method is used in complex background with complicated frequency domain feature extraction. The aggregation channel feature includes 10 image channels, and the pixel features of each channel are processed, filtered, and fused to obtain an ACF.MethodDuring gesture image preprocessing, a gesture target detection method using multi-channel feature fusion is introduced as the basic process of gesture recognition. Adaboost classifier and non-maximum suppression algorithm are used to detect target gestures. DTCWT processing is performed on the target gesture image intercepted after the target detection. Multiscale multi-directional decomposition is performed to obtain high and low frequency coefficients. Gradient histogram (HOG) and local binary pattern (LBP) features are extracted for each block of high and low frequency coefficients, respectively. Finally, the features of high-low frequency fusion are classified and identified by the support vector machine training model. Therefore, the identification problem is divided into two stages. The first stage detects the target area and deletes the background area, which significantly improves the efficiency of gesture recognition and paves the way for accurate classification in the second stage.ResultImages of multiple scenes and objects and different angles and distances were selected as the training set, and the front background was distinguished. A total of 20 types of gestures were identified and compared with traditional skin color detection, HOG feature gesture recognition, and class-Hausdorff distance. The gesture recognition algorithm was experimentally compared. For the illumination and distance in any acceptable range, the method can accurately realize gesture recognition in real time, and the average precision reaches 95.1%.ConclusionThis algorithm exhibits three advantages. First, the introduced gesture target detection algorithm enables accurate positioning and interception of the hand region even in the case of skin color interference in a complex background. Normalization to a fixed size can solve the problem caused by the gesture occurrence scaling. Second, DTCWT is used to extract the high and low frequency coefficients of the image in the frequency domain and calculate the features on the high and low frequencies, respectively. The influence of light and rotation is eliminated by extracting signal features of different components, decreasing redundancy and feature dimensions, and improving the efficiency of extracting features. Third, DTCWT demonstrates several characteristics, namely, translation invariance, direction selectivity, and a small amount of redundancy. This method exhibits fast calculation speed and less memory, which can effectively achieve real-time purposes. When the gesture area is accurately detected, the proposed algorithm can achieve satisfactory results. In future research work, we will further improve the accuracy of hand detection and classification recognition. The deep learning neural network is used to identify additional datasets and gesture types for solving the small factors that may cause misidentification, obtaining high gesture recognition efficiency, and making gesture recognition highly practical.  
      关键词:aggregate channel feature;dual-tree complex wavelet transform (DTCWT);histogram of oriented gradient (HOG) features;local binary pattern (LBP) features;feature fusion;support vector machine (SVM)   
      19
      |
      45
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685242 false
      更新时间:2024-05-07
    • 3D facial expression recognition based on weighted local curl patterns

      Jing Yu, Feipeng Da
      Vol. 24, Issue 7, Pages: 1076-1085(2019) DOI: 10.11834/jig.180473
      3D facial expression recognition based on weighted local curl patterns
      摘要:ObjectiveFacial expression recognition is an interesting and challenging problem and affects important applications in many areas, such as human-computer interactions and data-driven animation. Research on 3D facial expression recognition is becoming popular in recent years with the developments of 3D capture technique and face recognition. 3D face data are robust to pose, illumination, and gestures. Such data contain more topological characteristic and geometric information than traditional 2D face data. Thus, 3D face data can describe the facial muscle movements caused by expressions. Feature- and model-based methods are the two main types of approaches for 3D facial expression recognition. The former generally performs better than the latter in terms of computational efficiency. Although this method has been successfully applied to facial expression recognition, the existing feature-based methods cannot achieve the desired recognition effects. First, feature-based methods are limited by the issue of extracting features. Some features are extracted using the accurate position of labeled landmarks. Despite their high recognition rates, these features meet many difficulties in practical applications because of landmark location. Other features extracted without labeled landmarks facilitate the achievement of automatic expression recognition. However, these features may have limited information and sometimes tend to fail in recognizing similar expressions. Thus, extracting effective features without labeled landmarks is key for improving 3D facial expression recognition rates by describing the deformation of facial muscles during expressions. Furthermore, proper weights for features can strengthen the identification capabilities of features for expression recognition considering that features of different facial parts have varying degrees of importance. Therefore, we propose an algorithm for 3D facial expression recognition by using weighted local curl patterns. This study aims to extract local curl pattern features and then achieve accurate recognition by computing their weights.MethodThe proposed algorithm includes two parts. First, local curl patterns are extracted as highly discriminative features on the basis of the curl vectors of 3D face to represent the facial surface changed by expressions. Curl vectors have been proven to perform better than other features in face representation. The vector direction can be used to represent the spatial location of the 3D surface and the vector length can describe the shape characteristics. Therefore, we construct local curl patterns on the basis of curl vectors. The principle of coding is the same as that of local binary patterns in 2D images. Second, local curl patterns should be assigned with weights considering that different parts of the face have varying sensitivities of expressions. We propose to combine the ICNP(interactive closest normal points) algorithm with minimum projection error algorithm for calculating feature weighs. Finally, weighted local curl pattern features are entered into the classifier, and then the expression class of 3D face is predicted.ResultThe algorithm effect is verified by recognizing nine different expressions, including calmness, smile, laugh, surprise, fear, anger, sadness, and disgust, on the basis of the BU-3DFE(Binghamton University 3D facial expression) database, which was developed for 3D facial expression studies. The database contains a total of 100 subjects, including 56 females and 44 males from various ethnic groups and ages. Then, 20 of the subjects are selected randomly for training the classifier, and the remaining 80 subjects are selected for recognition. First, the discrimination power of local curl patterns is evaluated via PCA(principal components analysis)-relative entropy. Discrimination power refers to the ratio of inter- and intra-class similarities. The higher the discrimination power, the better the results that will be achieved in recognition. In our experiment, local curl patterns are proven to have the highest discrimination power among common expression features, including normal vectors and shape index. Second, on the basis of the BU-3DFE database, the mean recognition rate of our approach is 89.67%, which is comparable to other methods. The proposed method also achieves low error rates among angry, sad, and disgusting faces, which are often confused in expression recognition. The error rate between angry and sad is 6.26%, and that between angry and disgust is 5.38%.ConclusionLocal curl patterns are proven to extract the features of 3D faces effectively and perform well in expression recognition. The combination of ICNP and minimum projection error algorithms, which is more effective than traditional approaches, can enhance the discrimination power of local curl patterns. Experimental results show that the proposed approach is comparable to state-of-the-art methods in terms of its accuracy and excellent performance, especially for recognizing confusing expressions.  
      关键词:3D facial expression recognition;local features;curl features;division of 3D face;weighs of features   
      14
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685300 false
      更新时间:2024-05-07
    • Junying Gan, Ling Qi, Chuanbo Qin, Guohui He
      Vol. 24, Issue 7, Pages: 1086-1095(2019) DOI: 10.11834/jig.180499
      Lightweight fingerprint classification model combined with transfer learning
      摘要:ObjectiveFingerprint biometric plays a vital role in authenticating a person in a proper way. Fingerprint classification is crucial because it minimizes the search time of a large database, whereas authentication plays a significant role in the fingerprint identification system. Several problems, such as complex operation process, numerous parameters, large-scale data, and inadequate use of the fingerprint information, still prevail.MethodTraditional machine learning methods include supervised, unsupervised, and semi-supervised learning. Most methods assume that the distribution of annotated data is the same as that of unannotated ones. By contrast, transfer learning allows the domains, tasks, and distributions used in training and testing to be different and is highly concerned with target task training. The roles of source and target tasks are no longer symmetric in transfer learning. Transfer learning can not only overcome the drawback of small data scales but also make the model algorithm highly personalized. Therefore, a novel lightweight fingerprint classification model based on transfer learning is presented in this paper. Transfer learning is combined with small target data to improve network generalization further. First, a classical method based on gradient estimation is used to obtain the orientation field, which is divided into three parts, namely, Gaussian smoothing, gradient calculation, and orientation field estimation. The orientation field of the fingerprint image is enhanced to obtain large-scale data. Then, the enhanced image is used as the input of the lightweight Finger-SqueezeNet proposed in this work to obtain effective classification for initially adjusting the parameters. The Finger-SqueezeNet network model is mainly composed of five fire modules and two convolutional layers. The fire module minimizes the network parameters by replacing the 3×3 convolution with 1×1 convolution. However, parallel 3×3 and 1×1 convolutions are combined as the module output to guarantee classification accuracy. Finally, parameters of the pre-trained network model are fine-tuned on the basis of NIST-DB4, whereas some of them in front of the model are retained. Feature and model migrations are mixed for use. The fingerprint can complete the transfer learning at the feature level because the fingerprint orientation field map and fingerprint image belong to different feature expression spaces. Assume that model parameters can be shared in the source and target data. The transfer learning method based on the model is implemented by finding the shared parameter information between the source and target domains. Then, the method is fine-tuned to optimize the model directionally.ResultThe average classification accuracy of the proposed pure network model without transfer learning is approximately 93%. The network model with pre-trained transfer learning can reach 98.45%. Moreover, the test result obtained by the single fingerprint validation reaches 95.73%. In general, the classification of whorl fingerprints exhibits a good performance among the five categories. However, the arch and tented arch are relatively poor because of the 17.5% fuzzy labels of these two types of fingerprint. The two types of fuzzy fingerprints are separated from one class to improve the result, and both refused to be recognized, resulting in zero rejection rate. Final results show that the model exhibits strong generalization capability and high stability for fingerprint classification with different qualities. In addition, the presented model can dramatically minimize the parameters with high accuracy.ConclusionFingerprint intra-class transfer learning method combined with lightweight neural network can not only fully utilize fingerprint information but also exhibit promising application prospects in mobile terminal.  
      关键词:fingerprint classification;transfer learning;Finger-SqueezeNet;lightweight neural network;orientation field   
      16
      |
      9
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685312 false
      更新时间:2024-05-07
    • Qing Zhang, Yun Li, Wenju Li, Jiajun Lin, Mang Xiao, Feiyun Chen
      Vol. 24, Issue 7, Pages: 1096-1105(2019) DOI: 10.11834/jig.180224
      Salient object detection via deep features and multiple kernel boosting learning
      摘要:ObjectiveSalient object detection identifies the most conspicuous and eye-attracting objects or regions in images. Results are often expressed by saliency maps, in which the intensity of each pixel presents the strength of the probability that the pixel belongs to a salient region. Visual saliency detection has been used as a pre-processing step for facilitating a wide range of vision applications, including image and video compression, image retargeting, visual tracking, and robot navigation. Although the performance of salient object detection approaches has dramatically improved in the last few years, it remains challenging in computer vision tasks. Most existing methods focus on handcrafted features and use distinct prior knowledge, such as contrast, center, background, and objectness priors, to enhance performance. Recently, convolutional neural network (CNN)-based approaches have shown to be remarkably effective and successfully broken the limits of traditional handcrafted feature-based methods. The recent CNN-based salient object detection approaches have been successful in overcoming the disadvantages of handcrafted feature-based approaches and have greatly enhanced the performance of saliency detection. These CNN-based models, especially the end-to-end ones, have shown their superiority on feature extraction and efficiently captured high-level information about the objects and their cluttered surroundings. The existing handcrafted feature-based salient object detection algorithms are insufficient in effectively suppressing irrelevant backgrounds and uniformly highlighting the entire salient object and on complicated images with large salient object, cluttered backgrounds, and multiple salient objects. We propose a salient object detection scheme based on multiple kernel boosting learning and deep semantic information to overcome this drawback.MethodFirst, we segment the input image into multiscale superpixels and obtain weak saliency maps through graph-based manifold ranking. Second, we extract the deep features involving semantic information by using classic CNN. We obtain reliable training sets through the multiscale weak saliency maps to develop a strong salient object detection model by using multiple kernel boosting learning. Then, saliency maps are directly produced by samples from the multiscale superpixel images, which are infused to generate a strong saliency map. Finally, a pixel-level saliency map is refined in accordance with the color and position to improve the detection performance.ResultThe proposed moodel is compared with 11 state-of-the-art methods to evaluate its performance in terms of precision, recall, F-measure, PR (precision-recall) curve, weighted F-measure, OR (overlapping ratio) and MAE (mean absolute error) scores, and visual effect on three popular and public datasets, namely, MSRA5K, ECSSD, and SOD. Experimental results show the improvements over the state-of-the-art methods. The F-measure score of our algorithm increased by 0.7%, 2.0%, and 2.1%; the weighted F-measure increased by 18.9%, 27.6%, and 19.8%; the OR scores increased by 2.9%, 6.8%, and 7.2%; and the MAE scores increased by 34.5%, 26.9%, and 7.5% compared with the saliency results produced by the non-end-to-end deep learning model whose performance ranks second on MSRA5K, ECSSD, and SOD, respectively. The experiments on visual effect show that our method performs well in various complex images, such as saliency objects and backgrounds that share similar appearance, multiple salient objects, salient objects with complex texture and structure, and clutter backgrounds. The proposed approach not only uniformly highlights the entire salient objects but also efficiently preserves the contour of salient objects under various scenarios. Moreover, we conduct experiments on three datasets in terms of PR curves to evaluate the performance of each component of the proposed algorithm. Moreover, the average running time of our algorithm and the methods based on non-end-to-end CNNs is presented. The implementation is performed on ECSSD dataset by using MATLAB or C, and most of the test images have a resolution of 300×400 pixels. An efficient C/C++ implementation based on parallelized components would decrease our model's computation time and render it feasible for real-world application.ConclusionThe proposed salient object detection model demonstrates good performance on complicated images compared with the salient object detection method based on handcrafted features, which learns a strong classifier with four single kernel SVM(support vector machine) and uses classic CNN. Further improvements of salient object detection algorithm on dataset with complex and confusing background images are worth expecting. In further research, we plan to utilize additional features from a CNN and construct an end-to-end model, which would improve performance and save computation cost. Moreover, our further work will pay attention to small and salient object detections in video.  
      关键词:salient object detection;saliency detection;deep feature;multiple kernel boosting learning;multiscale detection   
      41
      |
      9
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685376 false
      更新时间:2024-05-07
    • Huilan Luo, Wu Shi
      Vol. 24, Issue 7, Pages: 1106-1115(2019) DOI: 10.11834/jig.180586
      Adaptive weighted object tracking algorithm with continuous convolution operator
      摘要:ObjectiveIn the visual tracking field, efficient representation of features is the key to robust tracking. Different convolution layers represent various aspects of the target in correlation filter tracking. An adaptive weighted object tracking algorithm with continuous convolution operator is proposed.MethodA continuous convolution operator method is proposed to convert discrete position estimates into continuous ones for solving the inaccurate target location problem, thereby rendering position location highly accurate. The feature representations of different convolution layers are leveraged to improve the tracking effect. Different convolutional layer features in deep convolutional neural networks have different expression capabilities. Specifically, shallow features demonstrate substantial positional information, whereas deep ones present considerable semantic features. Therefore, when feature expression and tracking can be conducted by combining them, better tracking effects can be obtained than using only deep or shallow features. First, the multi-layer convolution features are extracted by using the deep convolution network structure. The weight of each layer feature in the fusion features in the next frame is determined by calculating the correlation convolution response to highlight the dominant features and render the target highly distinguishable from the background or distractor. Then, the correlation filter trained from different layers is used to perform correlation operation with the extracted features for obtaining the final response map. The position of the maximum value in the response map is used to calculate the position and scale of the target. The weights of different convolutional feature layers are adaptively updated through the correlation filtering tracking effect of different convolutional layers. The feature expression capability of different convolutional layers in the convolutional neural network is fully exerted. The expression scheme is adaptively adjusted in accordance with the different environmental conditions of each frame to improve the tracking performance.ResultThe average success rate of the proposed algorithm is 85.4% compared with three state-of-the-art tracking algorithms in 50 video sequences of object tracking benchmark (OTB-2013) dataset.ConclusionExperimental results show that the proposed tracking algorithm has good performance and can successfully and efficiently track many complicated situations, such as illumination variation, scale variation, background clutters, object rotation, and occlusion.  
      关键词:object tracking;correlation filter tracking;continuous convolution operator;adaptive weighted;convolution features;response map   
      12
      |
      7
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685460 false
      更新时间:2024-05-07
    • Learning transferrable attention for joint balanced domain adaptation

      Ronggui Wang, Xuchen Yao, Juan Yang, Lixia Xue
      Vol. 24, Issue 7, Pages: 1116-1125(2019) DOI: 10.11834/jig.180497
      Learning transferrable attention for joint balanced domain adaptation
      摘要:ObjectiveMany image recognition methods demonstrate good performance when applied to training and test data extracted from the same distribution. However, these methods are unsuitable in practical scenarios and result in low performance. Using domain adaptive methods is an effective approach for solving such problem. Domain adaptation aims to solve various problems, such as when data are from two related domains but with different distributions. In practical applications, labeling data takes substantial manual labor. Thus, unsupervised learning has become a clear trend in image recognition. Transfer learning can extract knowledge from the labeled data in the source domain and transfer it to the unlabeled target domain.MethodWe propose a joint balanced adaptive method based on attention transfer mechanism, which transfers feature representations extracted from the labeled datasets in the source domain to the unlabeled datasets in the target domain. Specifically, we first transfer the labeled source-domain space category information to the unlabeled target domain via attention transfer mechanism. Neural networks reflect the basic characteristics of the human brain, and attention is precisely an important part of the human visual experience and closely related to perception. Artificial attention mechanism started to be developed as artificial neural network has become increasingly popular in various fields, such as computer vision and pattern recognition. Allowing a system to learn attending objects and understand the mechanism behind neural networks has become a research tool. Attention information can be used to improve image recognition accuracy significantly by defining the attention of convolutional neural networks (CNNs). In this study, attention can be seen as a set of spatial mappings that encode the spatial regions highly concerned with the network input to determine its possible output. Second, we introduce the prior distribution of the network parameters on the basis of the target dataset and endow the layer with the capability of automatically learning the alignment degree that should be pursued at different levels of the network. We expect to explore abundant source-domain attributes through cross-domain learning and capture substantial complex cross-domain knowledge by embedding cross-dataset information for minimizing the original function loss for the learning tasks in two domains as much as possible. Machine learning is an alternative approach for recognizing the refined features after preprocessing raw data into features on the basis of prior knowledge of humans. Machine learning experts have spent most of their time designing features in the past few years because recognition results depend on the quality of features. Recent breakthrough in object recognition has been mainly achieved by approaches based on deep CNN due to its more powerful feature extraction and image representation capabilities than manually defined features, such as HOG and SIFT. The higher the network layers are, the more specific the characteristics are for the target categorization tasks. Meanwhile, the features on successive layers interact with each other in a complex and fragile way. Accordingly, the neurons between neighboring layers co-adapt during training. Therefore, the mobility of features and classifiers decreases as the cross-domain difference increases. Finally, we describe the input distribution of the domain-specific adaptive alignment layer by introducing cross-domain biases, thereby quantitatively indicating the inter-domain adaptation degree that each layer learns. Meanwhile, we adaptively change the weight of each category in the dataset. Although deep CNN is a unified training and prediction framework that combines multi-level feature extractors and recognizers, end-to-end processing is particularly important. The design concept for our model fully utilizes the capability of CNN to perform end-to-end processing.ResultThe average recognition accuracies of the method in datasets Office-31 and Office-Caltech are 77.6% and 90.7%, respectively. Thus, this method significantly outperforms traditional methods based on handcrafted feature and is also comparable with state-of-the-art methods. Although not all single transfer tasks achieve optimal results, the average recognition accuracy of the six transfer tasks is improved compared with the current mainstream methods.ConclusionTransferring image features extracted from labeled data in the source domain to the unlabeled target domain effectively solves data problems from two domains that are related but differently distributed. The method fully utilizes the spatial location information of the labeled data in the source domain through attention transfer mechanism and uses the deep CNN to learn the alignment degree of the features between domains automatically. Learning ability largely depends on the degree of inter-domain correlation, which is a major limitation for transfer learning. In addition, knowledge transition is apparently ineffective if no similarity exists between the domains. Thus, we fully consider the feature correlation in the dataset between source and target domains and adaptively change the weight of each category in the dataset. Our method can not only effectively obtain high recognition accuracy but also automatically learn the degree of feature alignment between domains. This method also verifies that the inter-domain feature transfer can improve network optimization effect.  
      关键词:transfer learning;domain adaptation;attention mechanism;unsupervised learning;image recognition;convolutional neural networks   
      27
      |
      8
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685499 false
      更新时间:2024-05-07
    • Kaiwei Zhou, Jianwu Wan, Hongyuan Wang, Hongliang Ma
      Vol. 24, Issue 7, Pages: 1126-1135(2019) DOI: 10.11834/jig.180553
      Joint label prediction and discriminant projection learning for semi-supervised canonical correlation analysis
      摘要:ObjectiveCanonical correlation analysis (CCA) is a classic method of multi-view learning. Existing CCA-based methods often adopt the strategy of embedding the label information of samples into the models to improve the discriminative capability of the learning projection direction. However, obtaining the label information of data in real life is highly difficult and requires substantial manpower and material resources. In this respect, scholars have proposed the model of semi-supervised canonical correlation analysis, which can utilize limited number of labeled data and a quantity of unlabeled ones for training to learn the projection direction. However, the existing models of semi-supervised canonical correlation analysis adopt a two-step learning strategy. The model is developed after label prediction. Thus, the processes of label prediction and model development are independent. Using the prediction label of the unlabeled sample to construct the model leads to local optimization of the projection direction, thereby affecting the next classification results. This work proposes joint label prediction and discriminant projection learning for semi-supervised canonical correlation analysis to solve the semi-supervised learning problem and shortcomings of the two-step learning strategy.MethodThe algorithm combines label prediction with model development. Specifically, the label prediction is integrated into the framework of canonical correlation analysis. The label matrix of the training samples learned by the joint learning framework is used to update the projection direction. Then, the learned projection direction renews the labels of unlabeled data. The learning process of the label prediction and the projection direction depend on each other and are alternately updated. The predicted tag value should be as close as possible to its true value, which is beneficial to learning the optimal projection direction. The optimization of the joint learning framework adopts an alternate iterative strategy to achieve optimal values for the predicted label and projection direction. The discriminant features of the testing images are extracted from the discriminant projection direction acquired by the joint learning framework. Finally, the discriminant features of the testing images are categorized by the classifier.ResultExperiments regarding the proposed algorithm are performed on four face datasets, including AR, Extended Yale B, Multi-PIE, and ORL. The experiment results show that the proposed method can obtain enhanced recognition outcomes only by few features and labeled data. Specifically, three face images of each person in the training samples are selected as supervised samples to analyze the effects of experimental results from the different sample dimensions. Face recognition is high as face image dimension is high in any method. The face recognition rate of the proposed algorithm exhibits significant advantages in low dimension compared with other methods. When the feature dimension is 20, the recognition rates on the AR, Extend-ed Yale B, Multi-PIE, and ORL face datasets are 87%, 55%, 83%, and 85%, respectively. The 2 (3, 4, 5) face images of each person in the training samples are selected as supervised samples to analyze the effects of experimental results from the different numbers of labeled data. Face recognition is high as the number of labeled face image is large in any method. Five face images per person in the training sample were supervised samples. The recognition rates on the AR, Extended Yale B, Multi-PIE, and ORL face datasets are 94.67%, 68%, 83%, and 85%, respectively.ConclusionThe work proposes a joint learning method that render the learning projection direction highly discriminative, which can effectively handle a limited number of labeled data and a quantity of the unlabeled data and solve the shortcomings of the two-step learning strategy. The experiment results on the AR, Extended Yale B, Multi-PIE, and ORL face datasets demonstrate that the recognition rate of the proposed method is significantly higher than those of other methods. This condition occurs when the supervised samples in the training samples are scarce, and the dimensionality of the data features after dimension reduction is low. The convergence of the proposed iterative algorithm is confirmed by experiments. This finding shows that the feature extracted using discriminant projection direction learned by the joint learning model leads data after dimension reduction to keep the information inherent in the data as much as possible. Finally, enhanced classification results can be obtained using the classifier to categorize the extracted features.  
      关键词:canonical correlation analysis(CCA);label prediction;discriminative projection;joint leaning;semi-supervised   
      16
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685498 false
      更新时间:2024-05-07
    • Multimodal deep neural network for construction waste object segmentation

      Jianhua Zhang, Jiawei Chen, Shaobo Zhang, Jianshuang Guo, Sheng Liu
      Vol. 24, Issue 7, Pages: 1136-1147(2019) DOI: 10.11834/jig.180560
      Multimodal deep neural network for construction waste object segmentation
      摘要:ObjectiveConstruction waste is no longer useless nowadays. It has become an excellent economic development mode to protect the environment by recycling waste generated during construction and converting it into resources and energy. The current situation of construction waste in China has become increasingly severe. With the development of urbanization, old buildings have been demolished, rebuilt, and or replaced by skyscrapers. Moreover, once-inhabited areas have been gradually transformed into cities with ever expanding sizes. These cities develop quickly but entail serious hidden dangers. Construction waste generated by numerous construction work sites have become difficult to ignore. Urban construction waste refers to all kinds of construction waste generated during the construction, transformation, decoration, demolition, and laying of various buildings and structures and their auxiliary facilities. It mainly includes muck, waste concrete, waste brick, waste pipe, and waste wood. According to statistics, the amount of construction building waste in China now accounts for 30% to 40% of the total amount of municipal waste. In the next 10 years, China will produce more than 1.5 billion tons of construction waste per year on the average. By 2020, construction waste will reach 2.6 billion tons; by 2030, it will reach 7.3 billion tons. Resource utilization and recycling are inevitable options for dealing with construction waste in buildings. To deal with construction waste in buildings effectively, one can start from its characteristics. Construction waste is a mixture of various building material wastes, which are actually unutilized resources. In the 1990s, several communities in California first launched a single-stream recycling project, which referred to the mixture of all paper products, plastics, glass, metal, and other wastes. The mixture was separated into single items by a sorting system. In the sorting system, waste was mainly processed by a combination of hardware equipment and manpower. The system was not fully automated and relied mainly on human recycling; thus, it was inefficient. This attempt was meaningful because it let people understand the feasibility of recycling waste. Many construction wastes, such as waste bricks, waste rock, and scrap steel, can be recycled after being sorted, rejected, or crushed. However, systems, such as a single-stream recycling project, cannot handle substantial construction waste. Owing to the development of artificial intelligence technology, the use of intelligent robotic equipment in the field of construction waste recycling can greatly improve the capability, efficiency, and safety of recycling. Among these equipment, the robotic arm is the most widely used automated mechanical device in the industrial field. It can quickly grasp objects and work continuously. The emergence of robotic arms provides a new and efficient solution for the automatic sorting of construction waste in buildings. The use of robotic arms to sort construction waste is a revolutionary innovation for the construction waste treatment industry. The position and contour information of the object are indispensable to the robotic arm grabbing task. The application of computer image segmentation algorithms in this scene is undoubtedly suitable. Through image segmentation algorithms, a construction waste image can be accurately segmented to obtain the position and contour of each object. Combining robotic arms and image segmentation algorithms to achieve efficient construction waste recovery is worth exploring. However, segmentation of construction waste objects from construction waste images obtained via segmentation algorithms is difficult due to the characteristics of industrial sites and construction waste objects. With regard to the difficulty of object segmentation in construction waste images, this study proposes a construction waste object segmentation method based on a multimodal information deep neural network to solve the image segmentation problem and provides accurate construction waste object contour and category information for the construction waste automatic sorting system. Therefore, the system can realize automatic grabbing using a robot arm.MethodFirst, in scenes with severe color degradation, feature learning with RGB images alone does not meet actual needs. Therefore, training the salient features with depth information is necessary. Second, we treat the RGB image and the corresponding depth image as the input of the deep convolutional neural network. The deep convolutional neural network is used to perform high-dimensional feature learning, and the feature maps obtained from the convolutional layers of the last layer are weighted and summed then fed as input data of the SoftMax classifier. Finally, we obtain the label allocation probability of each pixel. On the basis of the probability that each pixel belongs to a category, we construct a multi-label, fully connected, conditional random field. The unary energy term treats each pixel as an independent item without considering the relationship between pixels. The binary energy term represents the relationship among pixels. Thus, similar pixels are divided into the same category, and pixels with large differences between each other are assigned to different categories, which makes the segmented edges smooth. We are thus able to obtain accurate segmentation results. Therefore, according to the actual situation, we propose an energy function suitable for construction waste objects. The global optimal solution is obtained by minimizing the energy function to segment the object in the image, thereby generating an independent segmentation block for each type of construction waste object. Finally, fine segmentation of local ambiguous regions is performed according to the depth gradient information. Ambiguous areas refer to the adhesion areas between construction waste objects that are difficult to distinguish due to the degradation of visual characteristics. The depth gradient information is used to obtain the local depth edge map, from which the local ambiguity area is extracted. For the local ambiguity area, the algorithm extracts the effective internal edge to segment the adhesion objects belonging to the same class.ResultIn the construction waste image test set, our method achieves 90.02% mean pixel accuracy and 89.03% mean intersection over union. Compared with several excellent semantic segmentation algorithms, the experimental results show that the proposed method performs better and improves segmentation accuracy.ConclusionThe algorithm proposed in this study can effectively segment and classify most construction waste objects simultaneously and can provide accurate contour and classification information of the construction waste object to a construction waste automatic sorting system so as to facilitate the automatic grasping of construction waste by robotic arms.  
      关键词:multimodal information;construction waste object segmentation;convolutional neural network;conditional random;depth gradient   
      17
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685541 false
      更新时间:2024-05-07

      Image Understanding and Computer Vision

    • Image corner detection using recursively maximum point-to-chord distance

      Yunhong Li, Yarui He, Weichuan Zhang, Xiaoji Zhou
      Vol. 24, Issue 7, Pages: 1148-1159(2019) DOI: 10.11834/jig.180564
      Image corner detection using recursively maximum point-to-chord distance
      摘要:ObjectiveCorners in images represent critical information in describing object features, which play a crucial and irreplaceable role in computer vision and image processing systems. Many computer vision tasks rely on the successful detection of corners, including 3D reconstruction, stereo matching, image registration, motion estimation, and object tracking. However, no strict mathematical definition for corner exists; corners are usually defined as points with low self-similarity or locations where variations in intensity in all directions are high. Alternatively, corners may be defined as image points containing the local maxima of curvature on the edge contour or the intersection of two of more edge curves. Many promising corner detection methods based on different corner definitions have been proposed by vision researchers. However, the traditional contour-based corner detection algorithm needs to calculate the curvature of each edge pixel and is sensitive to noise and local variations, thereby causing the instability of detection results. Therefore, this study proposes a novel image corner detection approach based on a recursive point-to-chord distance. MethodThis study analyzes state-of-the-art corner detection algorithms then proposes a new corner detection method. First, it extracts each edge contour from the input image using the Canny edge detector, which is one of the most widely used edge detectors in contour-based corner detection and has become a standard gauge in edge detection. An edge pixel appears when the gradient magnitudes at either side of it are lower than itself. However, the output contours may have small gaps, and these gaps may possibly contain corners. Second, it smooth curves by using three different Gaussian kernels. For each smoothed curve of Gaussian scale, the ends of the curve are connected, forming a chord. Then, the distance between each edge pixel of the contour and the chord is calculated, and the pixel with the longest distance is marked as the candidate corner. The original edge contour is divided into two edges by using the pixel point. Then, the pixel point is connected to the ends of the contour into two chords. The distance from the point to the chord is recalculated and compared with the threshold value. We select the point that is greater than the threshold as the candidate corner. Finally, the multi-scale technique is applied to the candidate corner set, and the final corners are obtained. ResultCompared with existing corner detection algorithms based on curvature calculation, the proposed algorithm does not need to calculate the first and second derivatives, effectively avoids the calculation error caused by local variation effectively, and is highly robust to noise. The four corner detectors achieve the highest average repeatability in JPEG quality compression and the worst localization error in shear transformation. The proposed and CPDA corner detectors perform better than the other detectors in geometric transformations. In terms of JPEG quality compression and Gaussian noise, the proposed method achieves the highest average repeatability and lowest localization error than the three other detectors. Experimental results show that the proposed detector attains better overall performance.ConclusionThe proposed detector does not need to accumulate each distance from a moving chord nor does it need to compute the accumulation of each point on a curve, thereby achieving good speed while keeping good average repeatability and accuracy. Compared with the three classic detection algorithms of Harris, CPDA, and He and Yung, the proposed detector attains better performance in average repeatability and localization error under affine transforms, JPEG compression, and Gaussian noise. Existing corner detection methods can be broadly classified into three classes: intensity-, model-, and contour-based methods. The aim of intensity-based corner detection is to extract local gray variation and structural information effectively. Model-based methods extract corners by fitting the local image into a predefined model. Contour-based methods obtain the image's planar curves by using an edge detector, smooth the curves by using a Gaussian function, and compute the corresponding curvatures. Finally, the points of local curvature maxima, line intersects, or rapid changes in edge direction are marked as corners. The two categories of methods have strengths and weaknesses, and their defects in practical application have been revealed, making corner detection a research hotspot in computer vision and image processing. Experiment results show that the proposed corner detector performs better than the other three classical detectors in terms of robustness. The corner detection algorithm in this study has good detection performance. Future tasks may continuously improve the algorithm's detection performance and apply it to many computer vision studies.  
      关键词:corner detection;multi-scale;point-to-chord distance;curvature;affine transforms;average repeatability;localization error   
      12
      |
      9
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685600 false
      更新时间:2024-05-07
    • Qingsong Li, Xudong Zhang, Jun Zhang, Xinjian Gao, Jun Gao
      Vol. 24, Issue 7, Pages: 1160-1175(2019) DOI: 10.11834/jig.180620
      Multilateral adaptive depth image super-resolution reconstruction viaRGB-D structure similarity measure
      摘要:ObjectiveDepth cameras can capture depth images of the dynamic scene in real time, which exhibits unique advantages in depth information acquisition. However, the depth images are often sensitive to noise and subject to low spatial resolution, and depth values are missing in some areas. Depth information and color edges are two complementary aspects that describe the same scene and therefore have a strong correlation in structure. Depth discontinuous transitions often coincide with color transitions. The color-depth correlation can be used for depth image reconstruction due to the blurred edges of low resolution depth images. Utilizing a high resolution color image as a reference is an important approach for reconstructing a high-resolution depth image. However, rich texture regions can be found in color images, except depth images. Among the highly challenging problems in color-guided depth image reconstruction is the inconsistency between color edges and depth discontinuities in the texture region. Simply passing structural information in the color image to the target one could introduce significant errors. Existing methods tend to consider only color images and ignore the correlation with depth images, which ineffectively solves the inconsistency, resulting in texture copying artifacts and even blurred depth edges. In this paper, we propose a color-guided depth image super-resolution reconstruction algorithm that is robust to inconsistency.MethodWe propose RGB-D structure similarity measure to predict the color edges that are most likely to coincide with depth discontinuities by using the structural correlation between the color and depth images. We examine local structural gradients rather than the gradient magnitude of individual pixel to measure the inconsistency effectively. Result shows that the proposed RGB-D structure similarity measure is less affected by color texture. We use the proposed RGD-D structure similarity measure as an image patch adaptive selection indicator, which can effectively reflect the discontinuity of the depth edges. The conventional image patch is centered on the pixel to be estimated. However, when the pixel is located in the depth edge region, the depth estimation is blurred due to the changes in the nearby depth gradients. In contrast with the conventional image patches, we select the optimal image patch that is least likely to contain prominent depth edges among all the image patches in the pixel neighborhood, which helps preserve sharp depth edges. Then, the multilateral guided estimation of depth values is performed in the selected optimal patch. We propose an oriented nonlocal means weighting scheme using high-quality structural gradients and directional information by utilizing the nonlocal characteristics of the color and the depth images. The weighting scheme combines with spatial and range kernel as the multilateral guidance for depth estimation, which effectively solves the structural inconsistency, preserves depth discontinuities, and is robust to depth holes. Finally, the three bandwidth parameters of our multilateral guidance weighting scheme are important benchmarks of our depth image super-resolution reconstruction model. The proposed RGB-D structure similarity measure is related to the depth image smoothness, which corresponds to depth discontinuity and is less affected by the incoherent texture. Small bandwidth parameters can effectively preserve depth discontinuities but poorly perform in smoothing noise. Large bandwidth parameters can effectively smooth noise but may blur depth discontinuities. We can adaptively adjust the multilateral guidance weight parameters in accordance with the RGB-D structure similarity measure to achieve robust depth image construction. The framework of our depth image super-resolution reconstruction is based on the multilateral guidance. Moreover, the corresponding relationship between the proposed RGB-D structure similarity measure and image smoothness is used to select the position of the neighborhood image patch and the size of the guide weight parameters adaptively.ResultQuantitative and qualitative evaluation results show that our method has promising performance compared with other state-of-the-art methods on Middlebury synthetic, ToF real, and Kinect real datasets and our own dataset. Our method can effectively suppress texture copying artifacts, restore the depth hole image, and preserve depth discontinuities. We use the mean absolute difference as an evaluation metric, which is also the commonly used evaluation analysis metric for the depth image reconstruction. In particular, the mean absolute difference of the proposed method is decreased by approximately 63.51%, 39.47%, and 7.04% on the Middlebury, ToF, and Kinect datasets on average compared with the suboptimal algorithm. Furthermore, when the up-sampling factor of depth image reconstruction increases, the results of our reconstruction are more evident than those of other state-of-the-art methods as we fully utilize the structural information of color images. The other methods ineffectively solve the influence of color textures, whereas the depth image information is no longer reliable. For the depth hole image, most of the previous methods can only restore the depth image but cannot increase the depth image resolution, or the two can be separated. Our method can effectively restore the depth hole image, and the depth image super-resolution reconstruction and experiments on the NYU raw dataset verify the effectiveness of our method.ConclusionOur method can effectively handle the inconsistency between color edges and depth discontinuities in color-guided depth image super-resolution reconstruction, effectively restoring depth holes. In particular, our method can be used for not only synthetic datasets but also real-world depth datasets to preserve depth discontinuities effectively.  
      关键词:depth image;super-resolution;RGB-D structure similarity measure;multilateral guidance;adaptive model   
      16
      |
      32
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685607 false
      更新时间:2024-05-07
    • Di Jia, Yuxiu Li, Mingyuan Zhao, Ningdan Zhu
      Vol. 24, Issue 7, Pages: 1176-1187(2019) DOI: 10.11834/jig.180665
      Line feature correction and purification for matching straight lines of image pair
      摘要:ObjectiveLinear feature matching is an important research content of computer vision. Existing matching methods exhibit different degrees of mismatching. The main factors leading to this problem include the following: the existing line detection results are not located at the real edge of the image, and local consistency checking for matching lines is lacking. For this reason, a line feature correction and purification method for image matching of linear features is proposed.MethodFirst, the edge features of the pair are extracted to obtain the binarized edge map. The edge gradient map and gradient vector flow are used to establish the gradient gravitational map. Second, the line detection method is used to extract the straight line features of the pair, and the new endpoints are determined by shortening the endpoints of these straight lines. Then, the distance gravitation map is used to calculate the position of the new endpoints to create new straight lines. Subsequently, a method of extending the new lines is proposed to ensure that the length of the original line is consistent. Finally, the point feature matching result is used to calculate the image pair epipolar line, and the line matching result is combined to determine the final check local feature area. The mismatched straight lines are eliminated by randomly sampling the feature matching in the small neighborhood to verify the straight lines matching.ResultThree different image pairs were selected for the experiments. A wide baseline stereo pairs was selected for the first experiment. Experimental result shows that most straight lines on the ground imperfectly fit on the ground pattern. Correction result shows that straight lines of this part were corrected to the edge of the pattern. In addition, the straight lines of doors and windows were also offset or inclined to the interior. Such lines were also corrected to the edges of doors and windows. The analysis outcome of experimental data indicated that many wrong matching lines were observed in the matching results obtained by direct line matching algorithm. However, the linear correction processed by our method could improve the matching accuracy from 50% to 84%. Although a small number of wrong matching lines were observed, 100% matching accuracy could be obtained by refining this matching result through our method. Another wide baseline image pair was selected for the second experiment. Experimental result indicated that the straight lines extracted near the edge of flowerpot show that gaps between the actual edge and extraction result. The lines were located at the edge of the flowerpot through our method. The deviation of the line position is highly evident near the window frame, and the corrected straight lines were pulled back to the correct position. The analysis outcome of experimental data indicated that the matching accuracy was also increased by 30%. An image pair of scale changed was selected for the third experiment. Experimental result demonstrated that the lines are regularly detected, but the straight lines in the horizontal and vertical directions deviate to the real position. This problem was solved after correction. Moreover, some tilted lines were also pulled back to the real edge. Only a slight scale difference in the third image pair was observed, and the matching accuracy increased from 92% to 100% compared with the previous experiments.ConclusionThis work aims to improve the registration accuracy of image pair. A general method of rectifying and purifying straight line features and its implementation process is proposed. A gradient gravity map is developed by combining gradient graph and gradient vector graph of edge graph. Accordingly, the positions of straight lines are corrected. The polar line obtained by matching points and line matching results are used to check the regional similarity for eliminating wrong matching lines. Three image pairs were tested to detect the linear registration rate, alignment rate after straight line correction, and matching result of purification matching results. Experimental results show that the average matching accuracy increased by approximately 30%. The method of this work can greatly improve the accuracy of matching the line features and easily correct and purify other line matching results and exhibits high practicability.  
      关键词:straight line feature;gradient vector flow;gradient gravitation map;epipolar line;random sampling   
      14
      |
      9
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685609 false
      更新时间:2024-05-07
    • Hierarchical structure histogram combined with color name

      Chenchen Yue, Zhiqiang Hou, Wangsheng Yu, Sugang Ma
      Vol. 24, Issue 7, Pages: 1188-1196(2019) DOI: 10.11834/jig.180617
      Hierarchical structure histogram combined with color name
      摘要:ObjectiveThe extraction of image features plays an important role in visual tracking and is an essential component of image matching and other image processing applications. Different image description methods can affect the performance of the algorithm directly. Domestic and foreign researchers have proposed numerous image features, which can be divided into two classes as follows: 1) deep features based on deep learning, which have obtained successful effects but need substantial data to train the model and have a great demand on the experimental platform that has several restrictions on its applications to a great extent; 2) traditional manual features, which can be performed on any existing platform that is simple and intuitive. They have fulfilled remarkable results in image processing and include scale-invariant feature transform, histogram of oriented gradient, and color name (CN). Thus, further studying manual features is essential. Nevertheless, improving the performance of the algorithm by only depending on a single feature is difficult. In this study, a hierarchical structure histogram combined with CN is proposed to overcome the problem in which a single color feature is susceptible to illumination changes, thus leading to poor robustness, and the problem in which the spatial structure of the image is sensitive to the deformation of the target, which will reduce the distinguishability of features. MethodAimed at the disadvantage in which the pixel gray value is susceptible to the layered image with illumination change, an improved method based on CN to the layer image is proposed in this work. First, the method projects the original RGB color space to a robust color space-color name space, and objects are represented by a probabilistic 11-dimensional map, which means that the input image is stratified to 11 layers according to CN. Second, each pixel in the image is projected into the layer with the highest probability so that the pixels intersect with each other to be empty and are integrated into the entire image. That is to say, each pixel can only be projected into one layer. Furthermore, for every hierarchical image, the method calculates the distribution of pixels in every dimension by counting the number of pixels in each square of structure image element then obtains the spatial distribution information of pixels. Finally, the pixel spatial information of each slice is connected in series as a hierarchical histogram to represent the image. ResultTo prove the validity and strong distinguishability of the proposed feature, two experiments are performed in this study. The first is image matching, whose strategy consists of an extant model. The position of the matched image is determined by traversing the original image exhaustively. The data set of image matching is from PASCAL VOC2007 (visual object classes), which contains various classes of objects (e.g., person, bird, car, and dog). When calling this data set, only the target pointed by the first ground truth of each image is used. The second one is visual tracking, whose tracking frame adopts particle filter, and the number of particles is 200. This experiment is evaluated on 100 sequences of object tracking benchmark (OTB100), which mainly contains 11 challenges (e.g., out-of-plane rotation, scale variation, and illumination variation) that may be encountered in object tracking. All experiments in this study are run on the Windows platform, and the development environment is MATLAB. We compare this feature against four traditional manual features, and experimental results show that image matching based on this features can locate the target accurately, distinguish similar targets well, and the peak value of the target is obvious. In the meantime, the mean value of peak-to-side lobe ratio increases by 1.347 9. Moreover, object tracking based on this feature can considerably reduce the center location error of tracking and improve the accuracy and success rate momentously. The success rate increases by 4%, and the accuracy increases by 4.6%.ConclusionHierarchical structure histogram based on the pixel gray value is easily affected by illumination and target rotation changes. Thus, in this study, we adopt a new method to layer images to gain robust features. The proposed feature combines CN features with spatial information of pixels in images to improve the capability of features to adapt to various scenes, such as illumination changes, deformation, and low resolution, and it effectively enhances the poor discrimination of a single feature and makes it more robust in image matching and object tracking, especially for the objects with the same color distribution but different spatial distributions of pixels. In addition, this feature retains the representation of traditional histogram features, thus making computation and similarity measurement simple. Compared with four traditional manual features, this feature exhibits better image matching performance and visual tracking results in most cases. Therefore, this feature can be effectively applied to image processing applications. However, this feature still has deficiencies, such as not considering the problem of target scale change in the visual tracking process. In follow-up work, we will further optimize the feature to obtain enhanced generalization capabilities in visual tracking and to remain applicable to other image processing applications simultaneously.  
      关键词:image matching;visual tracking;color name(CN);spatial information;hierarchical structure histogram   
      19
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685615 false
      更新时间:2024-05-07

      Remote Sensing Image Processing

    • Ping Han, Tinghua Song
      Vol. 24, Issue 7, Pages: 1197-1206(2019) DOI: 10.11834/jig.180503
      Aircraft target detection of PolSAR image combined with regional screening and multi-feature discriminant
      摘要:ObjectiveFew studies on the target detection of PolSAR images that can be referred worldwide are currently available. In the full polarization SAR image of complex scenes, difficulties in aircraft target detection exhibit the following aspects: On the one hand, the background part includes not only the airport but also several areas, such as city, forest, mountain, ocean, and road, due to the different statistical characteristics of each area. Hence, fitting all the statistical characteristics of the background with a statistical feature is impossible. On the other hand, the polarization characteristics of the vehicle, ship, and some small buildings are particularly similar to those of the aircraft target in the SAR image. Therefore, aircraft targets are difficult to distinguish from other targets with one feature. Moreover, the shape of the aircraft target cannot be presented in the full polarization PolSAR image due to the resolution of the PolSAR data obtained by the imaging system, which can only be expressed by some pixel features. The existence of these problems complicates the detection of aircraft targets in PolSAR images. Result shows that the aircraft target exhibits several characteristics in the PolSAR image: 1) the scattering power of the aircraft target is generally higher than that of the surrounding background; 2) the aircraft target is often parked in fixed areas, such as airports and aprons, and the regions are characterized by homogeneity; and 3) the aircraft target is presented in the form of pixel blocks in the PolSAR image.MethodAn aircraft target detection algorithm combined with regional screening and multi-feature discriminant is proposed to solve the abovementioned problems and combine prior data. First, image preprocessing is performed to minimize the effect on the original PolSAR image due to the speckle and random orientation from target reflection. Second, the regions of interest of the runway, tarmac, and regions whose scattering properties are similar are extracted in accordance with the image power value. Then, suspected aircrafts are extracted by the area of connected domain. Finally, prior knowledge indicated that the power of the aircraft target is relatively large, the scattering power of the airport area is comparatively small, the aircraft target tail and wing roots demonstrate dihedral structural features, and the aircraft targets often appear in the airport or apron area. Thus, the suspected aircrafts are screened in terms of different characteristics, such as power cross entropy, background homogeneity, and power difference.ResultExperiments were performed with the polarimetric synthetic aperture radar data from Half-Moon-Bay and Kahului Kona acquired by AIRSAR and UVASAR systems from NASA Laboratories in the United States. Few documents on the target detection of PolSAR image aircraft are available. Thus, the experiment was only compared with one method. The detection result of Half-Moon-Bay shows that both methods can accurately detect the aircraft targets. However, the proposed method produces seven false alarms. By contrast, the comparison method produces 22 false alarms. The Kahului result shows that the proposed and comparison methods can detect four aircraft targets. Nonetheless, the proposed method produces four false alarms. Contrarily, the comparison method produces 17 false alarms. The Kona result shows that the proposed method detects 13 out of 15 aircrafts and produces six false alarms. By contrast, the comparison method detects six out of 15 targets and produces 17 false alarms. The time spent in the experiment implies that the algorithm exhibits high computational efficiency.ConclusionThe method eliminates the false targets by fusing different features to extract the suspected aircraft target and then obtain the final test results. The algorithm does not need to extract the airport runway and apron area, thereby avoiding inaccurate detection caused by the incomplete extraction of the apron area. The final test results produce few leak alarms, and some false alarms were generated, but the proposed method simultaneously produces fewer false and leak alarms than the comparison method. This method presents great improvement in operational efficiency because it only needs to traverse the extracted suspected target and not all the pixel points. However, the proposed algorithm still needs improvement. For example, the algorithm must be improved in terms of controlling false alarms. The false alarms generated in the algorithm are small buildings, vehicles, and ships because their characteristics in the PolSAR image are similar to the aircraft target. The parameter selection remains unable to achieve complete self-adaptation because the background area contains not only the airport but also other areas, such as urban, forests, mountains, and oceans. The background's statistical properties cannot be fitted with distribution. In addition, when the two targets are close to each other, the target background area obtained by morphological expansion may include the target to be detected, which may have a certain influence on the result. Thus, these problems must be solved in future works.  
      关键词:PolSAR image;aircraft target detection;regional screening;polarization cross entropy;homogeneity;power difference   
      19
      |
      7
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55685665 false
      更新时间:2024-05-07
    0