最新刊期

    23 7 2018

      Review

    • Chunhua Jia, Xiaoying Guo, Ruyi Bai
      Vol. 23, Issue 7, Pages: 937-952(2018) DOI: 10.11834/jig.170626
      Review of feature extraction methods and research on affective analysis for paintings
      摘要:ObjectiveImage classification and affective analysis are popular issues in the field of computer vision, provide effective methods for digital painting, and play important roles in art protection and painting innovation. Brush stroke, colors, shapes, textures, and white spaces are important visual features of paintings. Classification based on these visual features can help identification of painting style and painter, analyze painting affection, further understand the meanings of painting creation and inherited cultures. In order to better realize the research and innovation of painting, this paper provides a comprehensive survey and analysis, focusing on domestic and international research on painting classification and affective analysis at present.MethodBased on the extensive literature research, the paper firstly shows the different representation modes of Chinese and western paintings and then analyzes the reason for the differences between these two modes. The reasons are mainly derived from the various cultural backgrounds and the different ways of thinking. Chinese traditional painting has a unique artistic expression technique, which accounts for the artistic "false or true complement" effect produced by the skillful use of white space. In addition, traditional Chinese painting attaches importance to the combination of calligraphy and poetry, is decorated with seals, and emphasizes the connection between art and nature to present spirit by form. Line is the basic modeling approach and color is the auxiliary characteristic of Chinese traditional painting. The bright and dark changes of light and shadow are not emphasized. Besides, line is a form of expression of the affective characteristics of traditional Chinese painting. For an appreciator, Chinese traditional painting requires more association and imagination than mere visual effect. The Chinese painting style mainly includes two major categories, namely, traditional ink paintings and murals. The formation of the ink-and-wash style is mainly distinguished by representative painters, such as Qi Baishi, Zheng Banqiao, Xu Beihong, Wu Guanzhong, Wu Changshuo, and Huang Gongwang. Mural research is represented by mogao grottoes in Dunhuang which have a distinctive national style characterized by rich and colorful content and a form of painting that embodies the expectations of people's good wishes. Western painting is distinct from traditional Chinese painting. Traditional western painting highlights realism, similarities in appearance, reproduction, space-time, and effect of light colors. Western painting attaches great importance to the change in color, light, and shadow to portray images. In addition, the elaborate and tactful use of color can reflect painting affection. An entire painting exhibits a good sense of texture and space because of the object shading. Contrary to traditional Chinese painting, western painting provides viewers with more visual effects. The Western painting style is associated with the development of the literature and art movement, and the formation of painting style is mainly divided into baroque, three-dimensional impressionist, romantic, Rococo, and Renaissance styles. Secondly, the paper outlines the machine learning methods of support vector machine, decision tree, artificial neural network, deep learning, which are commonly used in painting classification, and analyze the advantages and disadvantages of these methods. Moreover, this paper systematically analyzes and summarizes the current literatures, focusing on the two aspects of feature extraction and classification of painting, affective analysis of painting.ResultOn the basis of current literatures related, this paper sums up the painting database commonly used in the present study. Moreover, it reviews in detail the research status and the development of feature extraction technology and classification methods for Chinese and western paintings in terms of such characteristics as brush strokes, color features, shape features, texture features and white features. It also briefly describes the commonly used evaluation methods of painting classification model, which are error rate and accuracy, precision and recall ratio, P-R curve and F1 measurement, and ROC curve and AUC. And several commonly used evaluation indexes in current studies are analyzed. Computer vision technology has a distinct advantage in the fields of object recognition, scene classification, image classification, and affective and semantic image analysis, which can meaningfully evaluate the perception of target images and scenes by simulating human visual ability. The essential feature of the image is the key to accurate judgement. As an image resource, the selection of painting characteristics is important in painting classification and affective analysis. Closely related to painting feature selection and classification research, machine learning is a way that investigates how to use a computer to simulate or realize people's learning activities to acquire new knowledge and skills. Machine learning is widely used in artificial intelligence. Furthermore, the paper probes the affective investigation of Western painting on the basis of color features and provides an efficient idea for the affective analysis of Chinese traditional painting. Paintings can reflect the objective social life and rich affection of their painters, and using computer intelligence to analyze the painting affection can help us understand the history and culture of various periods well. Chinese painting has a unique style in terms of ink brush stroke, ink shade, white space, and painting content, which can also convey the different affections that the painters aim to express. Combining cognitive and psychological knowledge, image feature extraction methods can also realize the affective analysis and extraction of Chinese ink-and-wash paintings. Finally, using a database of painting classification, classification goals, and the limitations of affective analysis of paintings, this study highlights the existing problems and challenges in the study of painting classification and affective analysis and discusses solutions to these existing problems.Conclusion As an important cultural achievement of human, more and more painting research algorithms and exploring thinks will emerge in the further. This article can provide guidance to further studies on painting classification, especially in the research of traditional Chinese ink painting affective analysis and painting art creation.  
      关键词:Chinese and western paintings;painting database;feature extraction;classification method;evaluation method;painting affection;color and affective analysis   
      15
      |
      19
      |
      5
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678487 false
      更新时间:2024-05-07

      Image Processing and Coding

    • Multi-mode shape coding for 3D video

      Zhongjie Zhu, Yuer Wang, Gangyi Jiang
      Vol. 23, Issue 7, Pages: 953-960(2018) DOI: 10.11834/jig.170533
      Multi-mode shape coding for 3D video
      摘要:ObjectiveThree dimensional video has attracted considerable attention from the image processing community due to its satisfactory performance in various applications, including 3D television, free-view television, free-view video, and immersive teleconference.Compared with traditional block-based techniques, object-based methods have the merits of flexible interactivity and efficient resource usage and are thus favored in many practical applications.Hence, object-based 3D video, whose efficient shape coding is a key technique in practical applications, is an important developing trend.Shape coding has been considerably studied, and many methods have been proposed.However, most of the existing shape coding methods are mainly proposed for image or video shape coding and seldom for 3D video shape coding.A straightforward approach to encoding the shapes of 3D video objects is through the use of the same techniques for image or video objects.However, this method of coding does not completely exploit the redundancy among 3D shape videos, thus resulting in poor coding efficiency.Strong inter-frame redundancy exists across time and view directions in a 3D video sequence.Therefore, most of the existing 3D video coding schemes jointly adopt motion-compensated prediction (MCP) and disparity-compensated prediction (DCP) techniques to achieve high coding efficiency.3D video and 3D shape video share certain similarities; thus, correlations may also exist among object contours across time and view directions, which may be exploited in shape coding to improve coding efficiency.Hence, with this speculation and consideration of the requirements of practical object-based 3D video applications, an efficient multi-mode 3D video shape coding scheme is proposed in this study.This scheme is based on contour and chain representation, where the correlation among object contours across time and view directions is exploited to achieve high coding efficiency.MethodFor a given 3D shape image, the contours of visual objects are first extracted and preprocessed frame by frame to create perfect single-pixel width.That is, the object contours are 8-connected and only one path exists between any two neighboring contour points.A new metric called shape activity is then applied to assess the shape variation of objects within each frame.On the basis of this assessment, the entire frames are classified into two categories:intra-coding and predictive inter-coding frames.If the shape activity within a frame is large, then intra-coding will be implemented; otherwise, intra-coding will be conducted.For an intra-coding frame, it is encoded on the basis of linearity and direction constraints within chain links to achieve high coding efficiency.For an inter-coding frame, it is encoded using one of the three coding modes, namely, contour-based MCP, DCP, or joint MCP and DCP, to efficiently remove the intra-view temporal correlation and the inter-view spatial correlation among object contours and improve coding efficiency.The principles of MCP and DCP for 3D shape video are similar to those for 3D video.However, the correlation among object contours is dissimilar to that between video textures.In 3D video, the textures are two dimensional, whereas object contour is one dimensional.Video textures can generally be viewed as rigid, whereas the shape of an object contour often changes non-regularly.Usually, a small variation of object contour in a frame may considerably decrease the correlation between consecutive frames.In addition, correlations may decrease more quickly than textures with an increase in time interval.Hence, conventional prediction techniques for 3D video are unsuitable for 3D shape video.In our coding scheme, a new prediction structure is developed to effectively exploit the intra-view temporal correlation and the inter-view spatial correlation among object contours to efficiently encode 3D shape video.ResultExperiments are conducted to evaluate the performance of the proposed scheme, and results of partial comparison with several well-known methods are presented.The experimental results show that our scheme outperforms classic and state-of-the-art methods and the average compression efficiency can be improved by 9.3% to 64.8%.ConclusionThe proposed scheme can effectively remove the intra-view temporal correlation and the inter-view spatial correlation among object contours.This scheme can also achieve high coding efficiency that outperforms those of existing methods and has potential in many object-based image and video applications, such as object-based coding, editing, and retrieval.  
      关键词:three dimensional video;shape coding;multi-mode coding;predictive coding;chain code   
      12
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678485 false
      更新时间:2024-05-07
    • Sparse representation in tangent space for image set classification

      Kaixuan Chen, Xiaojun Wu
      Vol. 23, Issue 7, Pages: 961-972(2018) DOI: 10.11834/jig.170572
      Sparse representation in tangent space for image set classification
      摘要:ObjectiveIn image set classification, symmetric positive definite (SPD) matrices are usually utilized to model image sets.The resulting Riemannian manifold yields a high discriminative power in many visual recognition tasks.However, existing classic classification algorithms are mostly applied in the Euclidean space and cannot work directly on SPD matrices.To apply the classification algorithm of Euclidean space to image set classification, this work comprehensively reviews the unique Log-Euclidean metric (LEM) of the SPD manifold and the properties of the existing classical classification algorithm, and the classification task based on the image sets is achieved.MethodGiven that the SPD matrices lie on Riemannian space, we map the samples on the SPD manifold to the tangent space through logarithm mapping, and each sample in the tangent space corresponds to an image set.The form of the samples in the tangent space is a symmetrical matrix, and its dimensionality conforms with the samples on the SPD manifold.The symmetric matrix in the tangent space contains redundant information and has a large dimension.To improve the performance and efficiency of the algorithm, we need to reduce the dimensionality of the data in the tangent space.In our technique, we use the Nyström method and (2D)2PCA to obtain low-dimensional data that contain the main information of the image sets.1) The Nyström method can approximate the infinite-dimensional samples in the reproducing kernel Hilbert space (RKHS).The dimensionality of the samples mapped into the RKHS by kernel mapping is infinite, and the Riemannian kernel is obtained by the inner product of the samples in the tangent space using the LEM of the SPD manifold.For a set of $M$ training samples, the Riemannian kernel matrix $\mathit{\boldsymbol{K}} = {[k({\mathit{\boldsymbol{x}}_i}, {\mathit{\boldsymbol{x}}_j})]_{M \times M}}$ can be written as $\mathit{\boldsymbol{K}} \cong {\mathit{\boldsymbol{Z}}^{\bf{T}}}\mathit{\boldsymbol{Z}} = \mathit{\boldsymbol{V}}{\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}^{1/2}}{\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}^{1/2}}{\mathit{\boldsymbol{V}}^{\bf{T}}}$.Here, ${\mathit{\boldsymbol{Z}}_{d \times M}} = {\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}^{1/2}}{\mathit{\boldsymbol{V}}^{\bf{T}}}$, $\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}$ and $\mathit{\boldsymbol{V}}$ are the top $d$ eigenvalues and the eigenvectors of $\mathit{\boldsymbol{K}}$, where $d$ is the rank of the kernel matrix $\mathit{\boldsymbol{K}}$.The projection matrix can be denoted as ${\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}^{-1/2}}{\mathit{\boldsymbol{V}}^{\bf{T}}}$, and the $d$ -dimensional vector approximation of the random sample $\mathit{\boldsymbol{y}}$ in the RKHS can be written as ${\mathit{\boldsymbol{ \boldsymbol{\varSigma} }}^{-1/2}}{\mathit{\boldsymbol{V}}^{\bf{T}}}{(k(\mathit{\boldsymbol{y}}, {\mathit{\boldsymbol{x}}_1}), \ldots, k(\mathit{\boldsymbol{y}}, {\mathit{\boldsymbol{x}}_M}))^{\rm{T}}}$.2) (2D)2PCA (two-directional two-dimensional PCA) is a well-known dimensionality reduction (DR) technique for two-dimensional data in machine learning and pattern recognition.(2D)2PCA overcomes the limitation of PCA of working only on one-dimensional data and of 2DPCA being used for the row and column DR of two-dimensional data to obtain two direction projection matrices.In our experiments, the row direction projection matrix ${\mathit{\boldsymbol{W}}_{\rm{R}}}$ is consistent with column direction projection matrix ${\mathit{\boldsymbol{W}}_{\rm{C}}}:\mathit{\boldsymbol{W}} = {\mathit{\boldsymbol{W}}_{\rm{R}}} = {\mathit{\boldsymbol{W}}_{\rm{C}}}$, in which ${\mathit{\boldsymbol{W}}_{D \times d}}$ is a projection matrix for the row and column directions, because the form of the samples in the tangent space is a symmetrical matrix.The sample $\mathit{\boldsymbol{x}} \in {{\bf{R}}^{D \times D}}$ can be reduced as $\mathit{\boldsymbol{x}}\prime = {\mathit{\boldsymbol{W}}^{\bf{T}}}\mathit{\boldsymbol{xW}}$, where $\mathit{\boldsymbol{x}}\prime \in {{\bf{R}}^{d \times d}}$ and an efficient low-dimensional representation of the sample in the tangent space is achieved.To this end, the SPD matrices are transformed into low-dimensional descriptors with respect to the corresponding image sets.The classical sparse representation classification algorithm, Fisher discrimination dictionary learning in Euclidean space, which has good recognition rates for single images, can be utilized to classify the points of low-dimensional descriptors.ResultOur approach is used for several tasks, including face identification, object classification, and virus cell recognition, and we experiment on the YouTube celebrities (YTC), ETH-80, and Virus datasets.Results show that our algorithm has not only a higher recognition rate but also a relatively smaller standard deviation than several classical algorithms for image set classification, such as covariance discriminant learning (CDL) and projection metric learning.In the experiment on the ETH-80 dataset, our approach achieves a recognition rate of 96.25%, which indicates a considerable improvement over those of discriminative canonical correlations (DCC), CDL, and other classic methods.The standard deviation is only 2.12, which is smaller than that of the other methods; this finding indicates that our method has the best robustness on the ETH-80 dataset.For the YTC dataset, the recognition rate of our method is 78.26%, which is 10% higher than that of the other methods.This result shows the evident advantage of our method for the YTC dataset.The standard deviation is smallest, which shows that our method also has the best robustness for the YTC dataset.However, for the Virus dataset, our method has the highest recognition rate 58.67% but a large standard deviation.This finding indicates that the robustness of our method for this dataset is insufficient.ConclusionIn this work, we consider the geometrical properties of the SPD manifolds and the related Riemannian metric and combine them with the classical classification algorithm of the Euclidean space to implement image set classification and achieve good results.Experimental results show that our proposed method achieves high accuracies and generally small standard deviation.Therefore, our method can be widely applied in image set classification.Our future work will focus on constructing a lower-dimensional and more discriminative SPD manifold than that in this research.  
      关键词:SPD manifold;image set classification;NYSTRÖM METHOD;(2D)2PCA;sparse representation   
      18
      |
      55
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678488 false
      更新时间:2024-05-07
    • Mesh-based image stitching algorithm with linear structure protection

      Chuan He, Jun Zhou
      Vol. 23, Issue 7, Pages: 973-983(2018) DOI: 10.11834/jig.170653
      Mesh-based image stitching algorithm with linear structure protection
      摘要:ObjectiveThe image registration method based on mesh deformation can handle some parallax in the overlapping area of input images and can adapt to more complex scenarios where the scenery is not in the same plane.A new mesh-based image stitching with linear structure protection (MISwLP) is proposed.This algorithm applies constraints to the lines extracted from images to protect them from being distorted by the mesh deformation process, thus obtaining natural panoramas with reduced distortion.MethodMISwLP is based on mesh deformation.Images are meshed with a set of vertices, and image deformation is guided by the indexed vertices.The algorithm can be implemented with four steps.The first step is called APAP (As-Projective-As-Possible Image Stitching with Moving DLT) pre-registration.The APAP algorithm is applied to align the images, and the feature matching pairs obtained by the APAP algorithm can be used to obtain all the vertex matching pairs in the overlapped area of the image matching pairs, which are called matching points.These matching points are distributed evenly and can be used to obtain good alignment capability for the mesh optimization model.The second step is called global similarity estimation.The relative 2D rotation angle and the relative scale between two images are estimated in this step.Then, a similarity transform between the two images can be constructed.In the third step, a mesh optimization model is established for the input vertices of the images.The mesh optimization process is implemented in two stages.In the first stage, the energy function includes three terms, namely, the alignment, local similarity, and global similarity terms, and the original vertices are used as the input for this function.It is solved by the least-squares conjugate gradient method.The first stage aims to align the images.Then, the outputs of the first stage are used as the input vertices of the second stage.In the second stage, a new term called line protection is added for further optimization.The lines are extracted by the LSD algorithm with a threshold or user-guided interface and then sampled across the grid.The line protection term constrains the sample points in a straight line.The optimization solution is computed with a sparse matrix effectively.At this time, the distorted lines in the first stage of this step are straightened.In the fourth step, a texture mapping method is applied by affine transforming the input grids into the output grids.All images are blended with a linear blending method.ResultThe performance of MISwLP is verified using images captured from different sceneries by handheld devices, such as mobile phones and digital cameras, and several open datasets.The scenes include urban and nature sceneries.MISwLP can handle more complicated image stitching tasks, in which the scenery consists of two planes, than image-stitching algorithms, which use only one global holography, such as AutoStitch.Furthermore, MISwLP produces a natural stitching result and reduced projective distortion.In addition, MISwLP outperforms several state-of-the-art methods, such as SPHP (Shape-Preserving Half-Projective Warps for Image Stitching), APAP, and NISwGSP (Natural Image Stitching with the Global Similarity Prior).These algorithms use similarity transform to protect the non-overlapping area from projective distortion.Consequently, inconsistency is introduced between the overlapped and non-overlapped areas.The human eyes can perceive the destruction of certain geometry structures of the transitional area.MISwLP handles this problem with a line protection term and provides a good result with only a few geometry distortions.The proposed method works especially well for urban sceneries that contain many linear structures.For sceneries with no evident geometry, a user-guided auxiliary method is provided for selecting lines to protect.MISwLP is based on the NISwGSP algorithm, but the experiments show that their time complexities are nearly the same.Conclusion The performance of the proposed method is superior to those of state-of-the-art image-stitching methods.MISwLP protects the linear structure in the image stitching process, thereby providing a good stitching result with no geometry and projective distortions.Therefore MISwLP has good application value.  
      关键词:image stitching;mesh deformation;lines protection;energy function;optimization;least-squared conjugate gradient method;projective distortion   
      45
      |
      143
      |
      10
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678477 false
      更新时间:2024-05-07
    • Xianguo Li, Yemei Sun, Yanli Yang, Changyun Miao
      Vol. 23, Issue 7, Pages: 984-993(2018) DOI: 10.11834/jig.170538
      Image super-resolution reconstruction based on intermediate supervision convolutional neural networks
      摘要:ObjectiveLearning-based image super-resolution reconstruction has recently become a research hotspot. A new image super-resolution reconstruction method based on intermediate supervision convolutional neural network (CNN) is proposed to solve the problems of less network layers, small receptive field, and functionality that is limited to a single scale in the original network super-resolution CNN (SRCNN) to further improve the quality of image reconstruction.MethodThis method is based on the idea of deep learning CNN. First, when information regarding the input or gradient passes through many layers, such information can vanish and be "washed out" by the time it reaches the end (or beginning) of the network. Therefore, we design a CNN structure that has an intermediate supervision layer. The learning architecture has 16 weight layers, and information used for reconstruction (receptive field) is considerably substantial (31×31 versus 13×13). The learning architecture layers are the same type, except for the first, seventh, and the last layers:64 filters with a size of 3×3×64, where a filter operates on a 3×3 spatial region across 64 channels (feature maps). Each convolutional layer is followed by a rectified linear unit as an activation function. The first convolution layer operates on the input image. The seventh layer is an intermediate supervision layer that can guide the training of preceding layers in the CNN; this guidance can be considered an implicit deep supervision adopted to strengthen the learning capability during training. The last layer, which uses a single filter with size 3×3×64, is used for image reconstruction. Second, the supervision layer and reconstruction loss functions are defined to solve the vanishing gradient problem of the deep CNN. The training procedure includes three steps:image preprocessing, feature extraction, and image reconstruction. In the first step, the network is trained by the low-resolution images, which are blurred by different upscaling factors (2, 3, 4, possibly including fractional factors), to reconstruct different degrees of blurred images well. In the second step, the image feature is extracted using convolution operations. Unlike center pixels in the SRCNN, those in the center-surround relation methodology are inferred by surrounding pixels that are not fully utilized. We pad one before the convolutions to keep the sizes of all feature maps (including the output image) uniform, thereby increasing the use of edge information for images and feature maps. In the last step, a smooth loss function with a good generalization performance is easily achieved with a comprehensive use of the features of shallow complexity because the input and output (predicted) images have high similarity and the high-resolution image is reconstructed by the residual learning method.ResultThe proposed method is evaluated on open challenge datasets Set5 and Set14, which are often used in super-resolution methods. Experimental results show that the proposed method has better subjective visual effect and objective quantitative evaluation than bicubic interpolation, A+, SelfEx, and SRCNN. For subjective visual evaluation, the proposed method produces a reconstructed image that has superior clarity and edge sharpness. For objective evaluation, the average peak signal to noise ratio (PSNR) achieved by this method is 2.26 dB, 0.28 dB, 0.28 dB, and 0.15 dB higher, respectively, than those attained by the other approaches. Meanwhile, the time consumed is less than half that of the SRCNN method when using the trained network models to reconstruct images.Conclusion The flow of information and gradients can be smoothly propagated throughout the entire network by introducing intermediate supervision into our network, thereby enhancing the reconstruction capability of networks and the training efficiency. Extensive experiments confirm that the proposed method, which has intermediate supervision, improves the quality and efficiency of image super-resolution reconstruction. This approach has good generalization capability and can be used for the super-resolution reconstruction of natural scene images.  
      关键词:super-resolution reconstruction;deep learning;intermediate supervision;convolution neural network;vanishing gradients;residual-learning   
      11
      |
      4
      |
      7
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678480 false
      更新时间:2024-05-07
    • Mixed-feature regularization constraint for motion blur blind restoration

      Zhe Li, Jianzeng Li, Yan Zhang, Zhe Wang
      Vol. 23, Issue 7, Pages: 994-1004(2018) DOI: 10.11834/jig.170599
      Mixed-feature regularization constraint for motion blur blind restoration
      摘要:ObjectiveMotion blur blind restoration is the process of restoring a clear image without knowledge of the motion blur kernel function of the image. Resolving the fuzzy kernel and the clear image accurately and efficiently is the key to blind restoration. The regularization constraint technique approximates the reconstructed image to the ideal image by properly adding regular items with prior knowledge in the model. The blind restoration problem can be solved quickly and effectively using this technique. Unlike the single regularization method, the hybrid property regularization method can use known multiple prior conditions to impose numerous constraints on the model. While improving the accuracy of model solving, the number of iterations is reduced. Therefore, the multiple mixed feature regularization constraint method for the motion blur blind restoration method should be studied. To address the problems of poor anti-noise performance, incomplete fuzzy kernel smoothness constraint, and edge blurring of restored images in existing methods, a mixed feature regularization constraint method for the motion blur blind restoration method is proposed.MethodFirst, to improve the accuracy of the estimated fuzzy kernel model, the edge of the image is extracted to act on the fidelity term and thus estimate the fuzzy kernel. Edge extraction is easily disturbed by noise, thereby leading to the generation of false edges or additional noise and consequently reducing the estimation accuracy of the fuzzy kernel model. Therefore, the Local weighted total variation structure extraction algorithm is used to detect the significant edges of the blurred image to improve the anti-noise performance of the fuzzy kernel model. Then, sparse regular items and improved smooth regular items are added in this work to improve the sparsity and smoothness of the fuzzy kernel. L0 norm smoothness regularization is complicated; thus, the L1 norm of the fuzzy kernel is used to achieve the sparse constraint. An improved multiple mixed regularization term is used to achieve the smoothing restraint while further suppressing the outlier value due to the insignificant suppression effect of the Tikhonov regularization term outlier value. Finally, for the restored image to have a heavy-tailed character and to enhance the sharpness of the edge of the restored image, the super Laplacian prior and edge-preserving regularization terms are added in this work. The Laplacian prior is used to fit the gradient distribution of the clear image to enrich the edge details due to the poor fitting effect of other prior distributions. The edge resolution of the restored image is poor because only the heavy-tailed property constraint is used on the restored image model. Therefore, edge preserving regularization items are added to make the edge of the restored image close to the sharpened edge of the clear image.ResultIn this study, the advantages of the improved model and the proposed algorithm are verified by two groups of experiments. In the first experiment, the simulated motion blur image is taken as the experimental object. A comparison and an analysis of the restoration effects of the five combined steps verifies the robustness of the improved fuzzy kernel model and the improved restored image model. Experimental results show that the edge details of the improved model restoration image in this work are clear and natural and the evaluation index is noticeably improved. In the second experiment, the real motion blur image of a small UAV is used as the experimental object. The robustness and practicability of the proposed algorithm are analyzed. Moreover, the proposed algorithm is compared with the traditional algorithm. Experimental results show that the standard deviation of the restored image increases by approximately 11.4%. The average gradient increases by approximately 30.1%. The information entropy increases by approximately 2.2%. Furthermore, our method has better subjective visual effects than the traditional method has.ConclusionFor the blind restoration of motion blur images, the superiority of the improved model proposed in this work is demonstrated through theoretical analysis and experimental verification. Restoration through the proposed algorithm is better than that with the traditional algorithm.  
      关键词:blind image restoration;motion blur;mixed feature;regularization constraint;edge detection;blur kernel   
      12
      |
      5
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678673 false
      更新时间:2024-05-07
    • Sanli Yi, Sijie Li, Jianfeng He, Guifang Zhang
      Vol. 23, Issue 7, Pages: 1005-1013(2018) DOI: 10.11834/jig.170575
      Application of weighted nuclear norm denoising algorithm in diffusion-weighted image
      摘要:ObjectiveDiffusion-weighted imaging is a noninvasive method of detecting the diffusion of water molecules in living tissues and requires highly accurate data. Diffusion-weighted images have a high degree of self-similarity and rich feature details. The acquisition of diffusion-weighted images is often corrupted by noise and artifacts. Diffusion tensor images are calculated by the diffusion-weighted images. Meanwhile, diffusion tensor imaging is widely used in nerve fiber tracking in human brains. Noise affects the data accuracy of the diffusion tensor image and can cause erroneous tracking of fibers. Noise also affects subsequent processes. Therefore, the noise in the diffusion-weighted image should be reduced. Denoising is not only an important pre-processing step for many vision applications but also an ideal test bed for evaluating statistical image modeling methods.MethodAccording to the characteristics of diffusion-weighted images, the weighted nuclear norm denoising algorithm is proposed for diffusion-weighted image denoising; this algorithm adopts the image nonlocal self-similarity. First, the diffusion-weighted image is divided into many target blocks, and the nonlocal similar blocks can be obtained from the entire image by block matching. The nonlocal similar blocks of the image can be obtained by a sufficiently large local window instead of the entire image. Second, the obtained nonlocal similar blocks are stacked into a similar block matrix, and then the similar block matrix is decomposed by a singular value decomposition. Large singular values are more important than small ones because they represent the energy of the major components of the image. Therefore, different singular values are assigned various weights. Third, the singular values obtained are shrunk by the soft-thresholding operator to acquire the denoised nonlocal similar blocks. The larger the singular values, the less they should be shrunk. By aggregating the denoised blocks, the target block can be estimated. Finally, by applying the above procedures to each target block and aggregating all blocks together, the denoised image can be reconstructed.ResultThe weighted nuclear norm denoising algorithm is compared with traditional diffusion-weighted image denoising algorithms, such as anisotropic algorithm and texture detection algorithm, by simulation and real data experiments. Simulation results show that the peak signal-to-noise ratio of the weighted nuclear norm denoising algorithm is at least 20 dB higher than those of the other traditional algorithms, and the structural similarity's value is 0.2~0.5 higher than those of the other algorithms. In the real data experiment, the neural fibers obtained by the tracking of the diffusion-weighted images denoised by different algorithms are compared. The use of the number of fibers or the length of the longest fiber to judge the effect of noise reduction fails to represent the noise reduction effect satisfactorily, according to our findings. Therefore, the average length of fibers is proposed to express the denoising effect. The longer the average length, the better the denoising effect and that the smoother the fibers. Results show that the average length and the texture of the nerve fibers obtained by denoising using the weighted nuclear denoising algorithm is sufficiently long and smooth, respectively.ConclusionAn analysis of the experiment shows that the weighted nuclear norm denoising algorithm maximizes the self-similarity of the diffusion-weighted images and achieves image denoising through the processing of similar blocks. The weighted nuclear norm denoising algorithm can not only reduce the noise in the diffusion-weighted image and lead to visible peak signal-to-noise ratio improvements over state-of-the-art methods, such as texture detection, but also preserve the image's local structures better and generate less visual artifacts. The proposed algorithm can obtain improved results and DTI data accuracy and validity, which are helpful in the subsequent processing of images.  
      关键词:diffusion weighted imaging;weighted nuclear norm denoising algorithm;image denoising;peak signal-to-noise ratio;nerve fiber tracking   
      18
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678675 false
      更新时间:2024-05-07
    • Deqin Xu, Weixin Bian, Xintao Ding, Yuxiang Ding
      Vol. 23, Issue 7, Pages: 1014-1023(2018) DOI: 10.11834/jig.170632
      Fingerprint enhancement using sparse representation by multi-scale classification dictionaries
      摘要:ObjectiveMost automatic fingerprint identification systems (AFISs) are based on minutiae matching. The accuracy and reliability of minutiae extraction are largely dependent on the quality of the input fingerprint image. Thus, the performance of these AFISs is largely determined by the quality of input fingerprint images. In practice, the quality of a fingerprint image may suffer from various impairments, and the image may appear with ridge adhesions, ridge fractures, or uneven contrast. To improve the performance of AFISs, the quality of fingerprint images must be enhanced. This study proposes a novel fingerprint enhancement algorithm that uses sparse representation by multi-scale classification dictionaries.MethodFirst, we sample high-quality training fingerprints to build the training set for multi-scale classification dictionaries learning, and the multi-scale classification dictionaries are learned from the training set. A crucial issue in enhancing fingerprint images is obtaining an effective prior or constraint. Unlike generic images, fingerprint images have a steady and reliable ridge pattern. To obtain an effective prior or constraint, fingerprint patch orientations are estimated by weighted linear projection analysis (WLPA) on the basis of the vector set of point gradients. We classify training samples with the same size into eight groups according to their ridge orientation pattern. Instead of simply learning a dictionary, we learn a classification dictionary for each class with the same size. Second, fingerprints are pre-enhanced using the linear contrast stretching method. The sparse grey space in the fingerprint image is used, and the fingerprint image contrast can be stretched to cover the entire greyscale space. Consequently, the gray level information of the input fingerprint can be preserved against loss, and contrast enhancement can be improved. Contrast enhanced fingerprint contributes to the subsequent enhancement. Third, a fingerprint has a unique natural pattern, which is suitable for frequency-domain analysis. Generally, a good frequency-domain fingerprint enhancement approach is designed to work on spatial partitioning and frequency-domain enhancement. Thus, the fingerprint is partitioned into patches in the spatial domain on the basis of a non-overlapping window, the orientations of fingerprint patches are estimated by WLPA, and the qualities of the patches are evaluated and classified by the coherence of the point orientations. Finally, the fingerprint patches are transformed to the frequency domain by 2D discrete Fourier transform. The enhancement model of the patch spectrum is constructed via sparse representation modeling using classification dictionaries. The patch spectra are enhanced on the basis of a quality grading scheme and a composite strategy using multi-scale classification dictionaries learning combined with spectra diffusion. The fingerprint patch is enhanced according to its own priority, and patches with higher quality are enhanced when patches with lower quality are enhanced. Multi-scale classification dictionaries learning ensure the reliability of the enhancement. Spectra diffusion is successfully applied with the help of the quality grading and neighborhood priority scheme and the composite window strategy, and it can improve the quality of the patch spectra with low quality. Spectra diffusion provides accurate ridge spectra information for lower quality patches, thus ensuring the reliability of the ridge spectra for multi-scale classification dictionaries enhancing.ResultThe proposed method is implemented and tested on fingerprint images from FVC2004. Some visual experiments and performance evaluations of minutiae extractions are illustrated. We compare our method with state-of-the-art fingerprint enhancement methods and report that the proposed method is superior in enhancing fingerprint images. Experimental results demonstrate that low-quality fingerprints can be effectively enhanced by the proposed method. Compared with traditional fingerprint enhancement algorithms, the proposed method is more robust against noise and exhibits a more prominent effect on low-quality fingerprint images.ConclusionBy introducing ridge pattern priori into a classification dictionary, a classification dictionary for each class with the same size is learned. Classification dictionaries based on the ridge pattern constraint can capture a reliable ridge pattern prior. Using classification dictionaries improves the effectiveness of the sparse modeling of information in a fingerprint patch. The quality grading scheme and the composite window strategy are adopted to assist the multi-scale dictionary in overcoming the contradiction between accuracy and anti-noise capability. Furthermore, the combination of composite window and quality evaluation ensures that spectra diffusion is successfully applied. The proposed method significantly improves the quality of low-quality input fingerprints.  
      关键词:fingerprint;patch quality evaluation;multi-scale classification dictionaries;sparse representation;spectrum diffusion   
      11
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678677 false
      更新时间:2024-05-07

      Image Analysis and Recognition

    • Pan Gao, Guangshuai Liu, Ziheng Ma, Yafeng Yu
      Vol. 23, Issue 7, Pages: 1024-1032(2018) DOI: 10.11834/jig.170417
      Enhanced pairwise rotation-invariant co-occurrence extended local binary pattern
      摘要:ObjectiveAs effective local texture descriptors, local binary pattern (LBP) and its variants are widely applied in various fields of image processing. As an LBP variant, the pairwise rotation invariant co-occurrence LBP (PRICoLBP) algorithm classifies texture better than other LBP variant algorithms do by extracting high-order curvature and contextual co-occurrence information between spatial context co-occurrence pixel points. However, PRICoLBP determines these points by using the pixel gray gradient vector, which changes with image rotation and describes the image texture by using the co-occurrence histogram between the original LBP and uniform LBP (LBPu2) features of the context co-occurrence pixels. Consequently, the PRICoLBP algorithm has poor robustness to variations in image illumination and rotation and has high computing feature dimensionality. An efficient texture feature named enhanced pairwise rotation-invariant co-occurrence extended LBP (ELBP), which can fuse a variety of local texture structure information, is proposed in this work to address this problem.MethodThe binary coding sequence was obtained by performing binary quantization on the neighborhood pixel gray value of each pixel point. The LBP value corresponding to the different neighborhood points of each pixel, which was obtained by continuously rotating the binary coding sequence, was used as the initial point of coding. Then, two co-occurrence directional vectors with rotation invariance were determined using the central pixel point and the neighborhood initial points of coding corresponding to the maximum and minimum LBP values of each pixel, respectively. Two spatial context co-occurrence pixel points at different scales were selected along the two directional vectors on two grayscale images. Then, the correlation information among the central pixel gray level feature, the neighborhood pixel gray level feature, and the radial gray level difference feature of the spatial context co-occurrence points was extracted using the rotation-invariant uniform descriptor of the ELBP algorithm. The texture structure of a complex image was described by cascading the ELBP feature of each spatial context co-occurrence point. Finally, a chi-square kernel support vector machine, which was trained using texture feature histograms of the spatial context co-occurrence pixel pairs, was used to complete the detection of the image texture categories.ResultUnder the same experimental conditions, the classification recognition rate of the proposed method was improved by 0.32%, 0.57%, 5.62%, 3.34%, 2.1%, and 4.75% on the Brodatz, Outex (TC10, TC12), Outex (TC14), CUReT, KTH-TIPS, and UIUC texture databases, respectively, in comparison with the original PRICoLBP algorithm.ConclusionThe initial encoding sequence corresponding to the maximum and minimum LBP feature values of each pixel point is used to select the spatial context co-occurrence pixel pairs, and the rotation-invariant uniform descriptor of the ELBP algorithm is adopted to capture the local texture structure information of the context co-occurrence pixel point pairs. Therefore, the proposed algorithm can describe the high-order curvature information and more local texture structure information between context co-occurrence pixel pairs better than the original PRICoLBP algorithm. In the image classification experiment on the Outex, CUReT, and KTH-TIPS image libraries, which have variations in texture illumination and rotation, the proposed algorithm not only exhibits a higher classification recognition rate than the original PRICoLBP algorithm but also has richer local texture feature patterns under reduced feature dimensionality. The experimental results show that the improved algorithm is more robust to variations in texture illumination and rotation than numerous state-of-the-art LBP variant algorithms under the same conditions. This algorithm can also be effectively applied in image classification with complex environment changes due to its high robustness and distinctivene.  
      关键词:machine vision;pattern recognition;local binary pattern (LBP);the spatial context;pairwise rotation invariant;extremum;robustness   
      12
      |
      4
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678679 false
      更新时间:2024-05-07
    • Xinnan Fan, Jianyue Chen, Xuewu Zhang, Pengfei Shi, Zhuo Zhang
      Vol. 23, Issue 7, Pages: 1033-1041(2018) DOI: 10.11834/jig.180009
      Underwater polarized images restoration algorithm based on structural similarity
      摘要:ObjectiveNumerous restoration algorithms for single images exist. They are remarkable in the defogging of sky images, but most of them cannot be applied directly to the restoration of underwater images. Image restoration aims to process degenerated images to recover the original image (before degeneration). Underwater illumination is insufficient and unevenly distributed, and these light variations affect the results obtained by restoration methods of single images. In general, the results are unsatisfactory. Polarization is the basic feature of light, and the reflected light of underwater objects are mostly partially polarized. Therefore, polarized underwater images have special polarization characteristics, and underwater image restoration based on multiple polarized images has gradually become popular in recent years. Focusing on the mistiness and unobvious details of underwater polarized images, a restoration method for underwater polarized images based on structural similarity is proposed. This method is expected to improve the clarity, contrast, and color fidelity of images.MethodFirst, images taken through a polarizer at orthogonal orientations are obtained. The images have the best and worst backscatters. The water transmittance is only related to the depth of field and the attenuation coefficient of the water body; the object radiance depends on the incident light and the surface characteristics of the object. Therefore, we can assume that they are mutually independent. The structural similarity can measure the similarity of two images from brightness, contrast, and structure and can directly describe the correlation between the two images. Second, on the basis of the irrelevance relationship between the transmittance and the object radiance, the solution formula of water transmittance is derived by the structural similarity. The difference of the two polarized images is also the difference of the background lights in these images. This difference is also the function of depth of field. Thus, the polarized-difference image is used for calculating the initial value of transmittance during the iterative solution. An accurate transmittance is necessary for the good restoration of images. Finally, object radiance is obtained by inversing the underwater polarization imaging model, and color is corrected to produce the restored image. The color correction is based on a single point and chooses the point that has well-kept color information as the reference pixel. Then, the global pixels are normalized by the reference pixel to realize the color correction of the entire image.ResultIn the experiment, the proposed algorithm is compared with two other polarized restoration algorithms to test its effectiveness, and several groups of underwater polarized images are selected as research objects. The images used in this study were obtained from relevant studies. Quantitative indicators, such as contrast, information entropy, gray mean grads (GMG), peak signal-to-noise ratio (PSNR), measure of enhancement (EME), and runtime, are used for evaluating the effect. Results show that the contrast, information entropy, and GMG of our method are better than those of the two other algorithms. Moreover, a great restoration improvement effect is achieved. The YY algorithm removes the blur of the original images to a certain extent, but certain object areas of the recovered images are supersaturated. The images restored by the Huang algorithm are generally too dark to enable the identification of the scene details due to the inaccurate estimation of the degree of polarization of the object radiance. A comparison of the evaluation parameters shows that the contrast and GMG of our method are twice as high as those of the YY algorithm. Furthermore, the color distribution of the images recovered by our method are more homogeneous than those by YY, thus resulting in sufficient image information and the highest information entropy. The prominent EME also shows that our result has clear texture, high contrast, and good restoration. Certain color channels of the images obtained by the Huang algorithm are not recovered; thus, they have single color tones and the values of several color channels are as low as those of the raw images, thereby resulting in a small mean square error and an extremely high PSNR. In terms of time cost, our method and the Huang algorithm run relatively longer than YY because of the traversal process of the parameters.ConclusionOn the basis of the analysis of the underwater polarization imaging model and the statistical independence relationship between the object radiance and water transmittance, image restoration is conducted successfully after the estimation of transmittance. The problems of blurred details and low contrast in polarized underwater images are effectively solved. The results of the subjective and objective analyses show that the proposed algorithm can recover polarized underwater images effectively and obtain restored images with high contrast, obvious details, and rich color. Compared to other algorithms, the proposed algorithm can improve the contrast, clarity, and color balance of polarized underwater images significantly, thus providing an important foundation of underwater target recognition and analysis.  
      关键词:underwater polarization imaging;image restoration;structural similarity;transmittance;image processing   
      26
      |
      11
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678693 false
      更新时间:2024-05-07
    • Mixed recognition of upright and inverted faces

      Qiang Wang, Yingle Fan, Wei Wu, Yaping Zhu
      Vol. 23, Issue 7, Pages: 1042-1051(2018) DOI: 10.11834/jig.170474
      Mixed recognition of upright and inverted faces
      摘要:ObjectiveWith the high-resolution imaging and hardware capability of parallel computing, face recognition based on massive visual data has become a research focus in pattern recognition and artificial intelligence. To a certain extent, traditional face recognition algorithms also consider the principle of biological perception, such as using massive training sample data for dynamically modifying the structure and parameters of neutral networks and realizing optimal decisions. However, these methods use only several basic characters of biological perception and simulate them as a black box overall. The abundant visual mechanisms in biological perception systems are the bases of realizing visual comprehension and recognition. The mechanism of recognizing inverted faces on the basis of the different information flows of visual neural systems has been demonstrated. A new face recognition method is proposed to solve mixed upright and inverted face recognition using global and local visual neural information flow.MethodThe recognition of facade faces may depend on the mode of a component architecture, where the overall information is larger than the sum of the local features. The identification of an inverted face does not significantly depend on the characterization of the abovementioned overall information. Eyes, mouths, and noses are also characteristics of local features of information sources. Two visual cortical sensing pipelines reflect the global and local features in face recognition mechanisms. However, most methods consider the two pipelines or systems to be operating independently and not transforming information with each other. Therefore, a divide-and-conquer strategy is adopted in practice. However, this work argues that orthographic and inverted faces represent not merely a simple inversion of visual information. The two visual pathways that convey holistic and local features play a decisive role in orthographic and inverted face recognition and are not independent of each other. The two pipelines portrayed by face information should have a complementary relationship. In the use of global contour information for face identification, the contribution of face recognition performance to the facial features cannot be dismissed. In this work, we constructed a new face recognition system that is based on global and local information, which is transformed by two pipelines in visual cortical pathways. Our study considered the process of the visual cortical pathway that is based on the left and right hemisphere coordination mechanisms. First, the underlying neural network was constructed, and the redundancy reduction and preprocessing of upright and inverted face images were realized through the mechanism of sensitive texture and symmetric convolution kernels. Second, this work proposes the pooled neural network layer, which is based on local region extraction, and constructed the network structure of multi-local feature fusion to realize compression extraction and fusion of local information. Finally, a predictive function was defined according to the characteristics of left and right hematopoietic collaboration in the advanced visual cortex to integrate the global and local information.ResultVisual test and quantitative calculation results showed that the method had an enhanced feature capability in face recognition and could better identify upright and inverted faces in comparison with the traditional methods LDA, PCA, and DeepID. The experimental model was trained on the structure of a caffe neural network framework, and the parameters of the model were trained via batch gradient descent. With the use of an AT & T database as an example, the multi-local-feature fusion network structure was added to the classical convolution neural network (CNN) model. The recognition accuracy was improved from 98% to 100%, and this improvement indicated that the local information could improve the recognition capability of the facade. In the experiment, the result of the difference calculation showed that the underlying convolution kernel had symmetry and the same response to the texture features of the faces. The appropriate training dataset was used to adjust the relationship between the global and local information during fusion processing. The recognition rates of the model were 98% and 94% for the upright and inverted faces, respectively. Therefore, the positive and inverted face recognition had good characteristics. According to the pre-trained face recognition model, the two pipeline face systems exhibited satisfactory performance on the test dataset, which fused upright and inverted faces. Thus, our method can address the problem of face recognition with fusion.ConclusionIn this work, a localized feature-based pooling neural layer was designed on the basis of the texture sensitivity of the input image feature by a CNN to realize the network structure of multi-local feature fusion. Meanwhile, with consideration for the biological mechanism of local participation in recognition, the relationship between the left and right hemispheres in the advanced visual cortex was introduced and a prediction function integrating global and local information was proposed. The correlation between training data factors and local or overall characteristics was emphasized. The proposed face recognition method contributes to the understanding of optic nerve mechanisms. For example, the traditional neural network, which fuses the multi-local features, enhanced the face recognition features and thus increased the effectiveness of information. Compared with a single training dataset of inverted faces, the mixed training dataset of upright and inverted faces had a larger impact on inverted face recognition. Results showed the importance of inconsistencies in the selection of local features and the crucial role of internal differences in local features in face recognition. The hybrid recognition method of upright and inverted faces proposed in this work provides a novel research idea for face recognition technology and discusses the role of multi-view path fusion in image understanding and visual cognition of the advanced visual cortex.  
      关键词:face recognition;inverted faces;multiple local feature fusion;visual pathway;visual mechanism;convolutional neural network   
      11
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678817 false
      更新时间:2024-05-07
    • Fuqing Zhu, Xiangwei Kong, Haiyan Fu, Qi Tian
      Vol. 23, Issue 7, Pages: 1052-1060(2018) DOI: 10.11834/jig.170557
      Two-stream complementary symmetrical CNN architecture for person re-identification
      摘要:ObjectivePerson re-identification aims to identify persons of interest, who appear in particular scenarios, from mass surveillance data. Accurately implementing this process is critical. Thus, person re-identification has become a novel and challenging research topic for the community of public security. The main challenge is the pedestrian variations in images, which are as follows. First, pedestrian poses have complex varieties due to different human activities. Second, numerous camera perspectives exist because of the varying locations. Third, illumination differs in each period. These pedestrian variations compromise the performance of person re-identification. Recently, the CNN-based deep learning method has achieved great success in vision community applications. CNN has also led to the research of person re-identification, which has been demonstrated in several related works. The deep model, which can overcome these complex pedestrian variations effectively, has achieved better accuracy than traditional person re-identification methods. However, the number of annotated pedestrian images in the existing person re-identification dataset is relatively small due to the difficulty of pedestrian annotation in practice. Under this limited training set, the training process of the CNN model is insufficient using the existing one-stream architecture. Consequently, the discriminative ability of the learned deep model is compromised. To address these problems, we propose a two-stream complementary symmetrical CNN model, which has an improved network structure, for person re-identification.MethodThe newly designed architecture uses two-stream samples as input simultaneously. Each stream has complementary characteristics due to the concatenation of the fully connected layers. The input combination is diversified under the limited training set. The training process of the CNN model is richer.ResultWe evaluate the proposed method and the baseline on two large-scale public person re-identification datasets, namely, Market-1501 and DukeMTMC-reID. On the Market-1501 dataset, the rank-1 and mAP accuracies are 73.25% and 48.44%, respectively. On the DukeMTMC-reID dataset, the rank-1 and mAP accuracies are 63.02% and 41.15%, respectively. The proposed method yields a competitive performance against several existing person re-ID methods. Meanwhile, the proposed method exhibits its effectiveness by showing a stable improvement over the baseline.ConclusionIn this work, we propose a novel two-stream complementary symmetrical CNN architecture for person re-identification. With the use of the newly designed CNN architecture, the training process of the CNN model can be adequate even under a limited training set. Therefore, the learned CNN model can obtain a high discriminative representation of different pedestrians, and the performance of person re-identification is improved effectively.  
      关键词:public security;surveillance;person re-identification;convolutional neural network;deep learning;two-stream architecture;complementary and symmetry   
      13
      |
      4
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678822 false
      更新时间:2024-05-07
    • Application of faster R-CNN in image defect detection of industrial CT

      Haitao Chang, Junnian Gou, Xiaomei Li
      Vol. 23, Issue 7, Pages: 1061-1071(2018) DOI: 10.11834/jig.170577
      Application of faster R-CNN in image defect detection of industrial CT
      摘要:ObjectiveTraditional defect image recognition algorithms should manually build and select the main characteristics of targets, and the main defect features that are commonly used are as follows:(1) shape characteristics, including circumference, area, aspect ratio, and circularity; (2) gray scale features, including the mean of the gray scale and the variance of the gray level; and (3) characteristics adopted by the Hu invariant moment method to extract the mathematical features of translation, proportion, and rotation invariances and the suitable classifier selected to recognize the targets according to the selected features. The most frequently used classifiers include support vector machines, Adaboost, and artificial neural networks. These methods utilize several target features and classifiers to detect defects, which are usually designed manually. The identification process is time consuming and has certain limitations in application. Therefore, this study proposes a defect detection method that is based on faster regions with convolutional neural networks (faster R-CNN) features. The method uses convolution networks to automatically extract features of targets and avoids the problem of the dependence of defect detection on the character of the manual design.MethodThe method is based on a deep convolutional neural network. First, defect detection is determined. Three types of defects (detection targets) mainly exist in industrial computed tomography (CT) images, namely, slag inclusion, bubbles, and cracks. Then, the target images are manually annotated with a rectangular box (ground truth[GT] box), the coordinate files for the GT are generated, and 42 types of anchor boxes are selected according to the aspect ratio of the GT bounding box. The Laplace operator and the homomorphic filter are used to sharpen and enhance the data set before training. The enhanced images are processed by the convolution and pooling layers. The convolution feature map is obtained and sent to the region proposal networks for the initial judgment of targets (no specific category) and background. Meanwhile, the target border is roughly regressed. Then, the generated proposal box is sent to the region of interest pooling layer to produce fixed-length output. Finally, the proposed feature maps are judged by the fully connected and softmax layers for specific categories, and the probability vector of each category is generated. Moreover, an accurate target detection box is regressed using the bounding box regression layer.ResultThe size of the picture in the data set to be detected is between 150×150 and 350×250, and each picture contains several different categories of bubbles, slags, and cracks. The trained model can be used to detect the defect images and effectively classify them into various categories of defect targets. The trained model according to the GT aspect ratio selected anchor that can detect the smallest area of the bubble is 9×9 piexl (that is, the size of the GT area, non-defect area), and the smallest area with slag inclusion is 9×10 piexl. This model can accurately and quickly mark the locations of bubbles, slags, and cracks with a detection accuracy of up to 96%, and the average detection time of each image is 86 ms.ConclusionThis work uses a deep convolutional neural network to automatically extract the features of defects and classify and identify defects on the basis of the extracted features, thereby avoiding the problem of manually selecting target features in traditional defect detection. Moreover, the process of defect recognition and localization becomes increasingly automated, thus resulting in reduced detection time and improved efficiency. The trained model according to the GT aspect ratio selected anchor is more pertinent for small targets, such as bubbles and slags, and for relatively large targets, such as cracks, and the border regression is accurate. The collected industrial CT images are grayscale pictures, whose information are not as rich as that in RGB pictures. Thus, CT image preprocessing before training can improve the detection accuracy of the training models, and the accuracy rate of the overall detection is better by 2% than that without preprocessing. The faster R-CNN defect detection algorithm proposed in this work has a fine detection effect and can detect the target in the industrial CT image quickly and accurately. If additional types of defects need to be detected, then a new detection model can be obtained by the fine-tuned training of the network. This work provides a new and efficient solution for the defect detection of industrial CT images.  
      关键词:deep learning;Faster regions with convolutional neural networks features (Faster R-CNN);convolutional neural networks;defect detection;Industrial Computed Tomography   
      93
      |
      305
      |
      18
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678837 false
      更新时间:2024-05-07

      Computer Graphics

    • Graphic-image mixed method for large-scale building rendering

      Yang Zhou, Xiaofei Hu, Caijiao Jin, Long Zhang, Andong Chen
      Vol. 23, Issue 7, Pages: 1072-1080(2018) DOI: 10.11834/jig.170501
      Graphic-image mixed method for large-scale building rendering
      摘要:ObjectiveConstructing a smart city is an effective approach to achieve sustainable development and enhance the comprehensive competitiveness of cities. A 3D city visualization system is an important part of smart city applications. The structure of a 3D city building model is complex and the data are exceedingly huge. One of the bottlenecks that restrict the dissemination of digital 3D city applications is lack of an efficient and mature visualization system. Level of detail (LOD) and out-of-core algorithms are the key points in many existing large-scale building scene rendering techniques. The algorithms can improve the rendering efficiency by reducing the amount of data drawn in each frame. When the scene is sufficiently large, even the complex optimization algorithm cannot achieve improved results. Thus, rendering of 3D city buildings is extremely slow.MethodA novel idea was developed based on the traditional study. We present a graphic-and image-mixed method for large-scale building scene rendering. First, the view frustum is split into three regions along the z axis, as follows:interesting region, less interesting region, and uninteresting region from near to far. We adopt the different rending methods in each region. The building models are drawn in the interesting region on the screen by the traditional graphics method. In the less interesting and uninteresting regions, the image-based rendering and off-screen rendering technology is used by applying the frame buffer object and render-to-texture technology and then drawing the building models on a texture image, where the render buffer is attached. At the end of each frame, the texture image is blended with the screen in consideration of the depth information to obtain the final rendering result. We design a 3D building tile model to improve the efficiency of data load and rendering. The city extent is split with grids. The buildings in the same grid cell compose a tile, and R-Tree index is used to search these tiles. The geo database 3Dcitydb is used in data processing.ResultThe algorithm is tested on a huge public CityGML model data in urban areas of New York, which includes 188195 building models at LOD0 and LOD1 levels. Several groups of experiments are conducted to compute the scene drawing frame rate. The amount of building model data varies in each experiment. The frame rate is better than 20 frame/s in each scene drawing experiment. We also compared the visualization result with Cesium platform. The LOD algorithm used in Cesium extracts only some distant buildings to improve the rendering efficiency. Some models are lost in the scene. However, this technology can draw the entire scene without any loss. The experimental result is acceptable and shows that the system is operating smoothly.ConclusionThe graphic-image mixed scene rendering method maintained the continuity of scene roaming, and at the same time, has the advantage of image-based rendering technology, where the rendering frame rate is independent of the scene scale. The experimental results demonstrate that the algorithm can improve the data-bearing capacity of the visualization system, especially the large-scale low-resolution building scene. It can roam the huge building data smoothly under conditions of relatively low hardware performance and finally render the scene without vision loss.  
      关键词:city 3D;graphics and image mixed;large-scale scene;huge data;ubran building   
      11
      |
      4
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678857 false
      更新时间:2024-05-07
    • Grouped progressive iterative approximation method of data fitting

      Guo Zheng, Li Zhang, Shijie Zhang, Zhuangping Du, Yi Liu, Jieqing Tan
      Vol. 23, Issue 7, Pages: 1081-1090(2018) DOI: 10.11834/jig.170559
      Grouped progressive iterative approximation method of data fitting
      摘要:ObjectiveThe progressive iterative approximation (PIA) method has a wide range of applications for solving interpolation and fitting problems in computer-aided design. The PIA format presents an intuitive way of data fitting by adjusting the control points iteratively. This method format generates a sequence of curves/surfaces with fine precision. According to the PIA property, the limit of the curve/surface sequence interpolates the initial data points if the blending basis is normalized and totally positive and its corresponding collocation matrix is nonsingular. To increase the flexibility of the PIA method in the large-scale fitting of data points, a new PIA method, which is based on grouping, is proposed in this work.MethodFirst, the initial data points to be fitted are divided into several groups. Second, by applying the PIA or LSPIA method on these grouped data points separately, we can obtain a sequence of curves/surfaces with the PIA property for each group of data. Then, by adjusting the control points on the boundary according to the continuity conditions, a blending algorithm is implemented on these separate curves/surfaces. Thus, we finally acquire one whole curve/surface piece and ensure its continuity. Moreover, by grouping the data points, we can reduce the computation and improve the iteration efficiency.ResultWe use the PIA, LSPIA, and proposed methods on a single data set to fit data points. The grouped PIA method is convergent because we use the PIA or LSPIA method to fit each group of data points. Compared with the fitting error of the PIA method, that of the grouped PIA method decreases according to the number of groups we divide. That is, the more groups we divide, the more the fitting error is reduced. Compared with the fitting error of the LSPIA method, that of the grouped PIA method is lower by approximately one half.ConclusionThis study mainly aims to combine the grouping idea with the PIA method and proposes a grouped PIA method of fitting the data set. This method not only brings enhanced flexibility to the large-scale fitting of data points but also delivers a higher convergence rate and fewer errors than the PIA and LSPIA methods do. Numerous numerical examples are presented to show the effectiveness of this technique.  
      关键词:geometric iterative methods;grouped progressive iteration;blending algorithm;G2 continuity;iteration efficiency   
      11
      |
      4
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55678860 false
      更新时间:2024-05-07
    0