最新刊期

    22 4 2017
    • Statistical analysis of the publication of

      Zhang Yujin
      Vol. 22, Issue 4, Pages: 415-421(2017) DOI: 10.11834/jig.20170401
      摘要:Twenty years after the creation of the , its development in the past 20 years are to be reviewed, its current status are to be reviewed and summarized, and its future directions are to be looked ahead. Carrying out the statistics and analysis of published data and state of affairs in the first 20 years since the inception of this journal in 1996, including the creation circumstance, the setting columns (with the number of papers) and topics, the general situation and the characteristics reflected in the survey series published in this journal, as well as the number of publication periods, the number of pages and the number of articles (including the pages of each issue, each number of articles, each of pages, etc.), and their changes and trends. These statistical data are compared with other 14 image engineering journals, the technology position and academic level of this journal in the relevant field are reflected and established. After 20 years of evolution and progress, this journal is already an important publication source in the specialized field, nevertheless there are continuing requirements and goals for improvements.  
      关键词:academic periodical;statistics of publication data;columns;amount of literature;trend analysis;bibliometrics;impact factor   
      2679
      |
      273
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110934 false
      更新时间:2024-05-07
    • Xu Shaoping, Zhang Xingqiang, Jiang Yinnan, Tang Yiling, Jiang Shunliang
      Vol. 22, Issue 4, Pages: 422-434(2017) DOI: 10.11834/jig.20170402
      摘要:Observed images can be easily contaminated by various noises during acquisition or transmission. As an important preprocessing module for various image processing systems, image denoising has been explored extensively in the last few decades. Various image denoising algorithms have been developed to improve the quality of images corrupted by some form of noise model, which is frequently assumed to be additive white Gaussian noise in the literature. As the core parameter of the non-blind block-matching and 3D filtering (BM3D) algorithm, noise level (i.e., variance) should be set manually in actual applications. This procedure significantly affects the noise reduction performance of the BM3D algorithm and limits its application scope due to inaccurate noise level estimation. To resolve this problem, a novel local means estimation (LME) algorithm that is utilized as the preprocessing module of the BM3D algorithm is proposed. In this work, we focus on solving the problem based on the quality-aware feature extraction of natural scene statistics (NSS) and local means techniques, which we can apply to automatically predict the noise level parameter with high accuracy and efficiency. Research on NSS has clearly demonstrated that clean images of natural scenes belong to a small set of space of all possible images and exhibit strong predictable statistical regularities in the spatial or frequency domain that can distinguish them from corrupted ones. By contrast, nonlocal means-based estimation exhibits the right features that interest us, such as its conceptual simplicity and effectiveness. In particular, several widely representative and clean images were selected and corrupted by Gaussian noise with different variances to constitute a set of distorted images. The sub-band coefficients of a corrupted image obtained from wavelet transform over three scales and three orientations were parameterized using generalized Gaussian distribution. These estimated parameters were used to form a feature vector that described image noise level. The feature vector extracted from each distorted image belonging to the distorted image set and the corresponding Gaussian noise variance constituted the feature vector database. The proposed quality-aware features have extremely low computational complexity, thereby making them appropriate for time-constrained applications. During the noise reduction stage, the feature vector of a noisy image to be denoised was extracted using the same feature extraction approach. We selected the feature vectors and the corresponding noise level values that were similar to the extracted feature of the noisy image in the feature vector database to estimate its variance using the LME approach. The estimated variance was subsequently used as the input parameter of the BM3D algorithm. The BM3D algorithm was transformed into a blind denoising algorithm called BM3D based on the LME algorithm (BM3D-LME). The accurate estimation of noise level from a single noisy image is of fundamental interest in a wide variety of digital image processing applications. This procedure is highly important for tasks such as denoising, super-resolution, and segmentation. We verified the accuracy, robustness, and effectiveness of the proposed method on a large number of representative images from several benchmark databases. Experimental results show that the LME algorithm can accurately and rapidly estimate the noise level in any image to be denoised. The actual noise reduction effect of the BM3D algorithm is effectively improved with the aid of LME, and its application scope is also expanded.  
      关键词:noise level estimation;feature vector extraction;local means estimation;block-matching and 3D filtering (BM3D);blind image denoising   
      3572
      |
      473
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110883 false
      更新时间:2024-05-07
    • Improved search algorithm for compressive sensing image recovery based on L

      Jiang Yuan, Miao Shengwei, Luo Huazhu, Shen Pei
      Vol. 22, Issue 4, Pages: 435-442(2017) DOI: 10.11834/jig.20170403
      摘要:As one of the key technologies in compression sensing, a reconstruction algorithm plays a key role in scientific research.Commonly used reconstruction algorithms include the non-convex optimization algorithm based on L norm and the convex optimization algorithm based on L norm. However, these algorithms exhibit shortcomings, such as low precision reconstruction and extremely long operation time. To overcome these limitations and improve the precision and efficiency of the existing compressed sensing image reconstruction algorithm based on L norm reconstruction, an improved algorithm is presented in this study. Sequential quadratic programming is one of the most effective methods for solving constrained nonlinear programming problems in recent years. Compared with other algorithms, it demonstrates the most apparent advantages of good convergence, high computational efficiency, and edge search capability. The use of the Lagrange function series quadratic programming (SQP) method for the heather (Hesse) does not provide a positive definite matrix calculation for many problems; therefore, a value function should be introduced, the Hesse matrix sequence quadratic programming method should be corrected, and an image block compressed sensing technology should be integrated. This study proposes an image reconstruction algorithm based on L norm compression awareness. Under the same sampling rate of 40%, the signal-to-noise ratio (SNR) of the proposed algorithm is 34.28 dB, which is better than that of the block orthogonal matching pursuit (BOMP) algorithmThe BOMP algorithm's SNR is 33.76dB) and the algorithm presented when the penalty function is used as the correction method (30.23 dB). The time of the proposed algorithm is 190.55 s, which is faster than that of the BOMP algorithm (302.14 s) and that when the penalty function is used as the correction method (586.15 s). When the sampling rate is 50%, the SNR of the proposed algorithm is 35.42 dB, which is better than that of the BOMP algorithm (34.56 dB) and that when the penalty function is used as the correction method (31.38 dB). The computation time of the proposed algorithm is 196.67 s, which is faster than that of the BOMP algorithm (617.62 s) and that when the penalty function is used as the correction method (1 071.15 s). When the sampling rate is 60%, the SNR of the proposed algorithm is 36.33 dB, which is better than that of the BOMP algorithm (35.18 dB) and that when the penalty function is used as the correction method (33.57 dB). The computation time of the proposed algorithm is 201.72 s, which is faster than that of the BOMP algorithm (1 136.29 s) and that when the penalty function is used as the correction method (1 505.35 s). At a sampling rate of 70%, the SNR of the proposed algorithm is 38.62 dB, which is better than that of the BOMP algorithm (37.65 dB) and that when the penalty function is used as the correction method (35.17 dB). The calculation time of the proposed algorithm is 214.68 s, which is faster than that of the BOMP algorithm (1 802.42 s) and that when the penalty function is used as the correction method (2 415.81 s). Experimental results show that the improved algorithm is superior to the BOMP algorithm and algorithm when the penalty function is used as the correction method in terms of reconstruction precision and calculation time under the same sampling rate. When the sampling rate is higher, the precision of the reconstructed image will be higher and reconstruction time will be shorter. The convex optimization algorithm for image reconstruction based on compression perception theory is superior to the non-convex optimization algorithm for image reconstruction.However, reconstruction precision emains low and reconstruction rate emains slow, and thus, the problem remains serious. This study proposes an image reconstruction algorithm based on L norm compression awareness. Experiment results indicate that the proposed algorithm generally improves image reconstruction precision and computation time. Further study should be conducted to establish an algorithm with higher precision and faster image reconstruction.  
      关键词:compressive sensing;image reconstruction;norm;sampling;ratevalue function;sequence quadratic programming   
      2656
      |
      354
      |
      4
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110711 false
      更新时间:2024-05-07
    • Zuo Liwen, Luo Ting, Jiang Gangyi, Gao Wei, Hu Tianyou
      Vol. 22, Issue 4, Pages: 443-451(2017) DOI: 10.11834/jig.20170404
      摘要:Information hiding technology can effectively guarantee information security by concealing secret information imperceptibly in digital media through the redundancy of multimedia data and human eye masking characteristics. The combination of a video coding process and standard is a critical approach to video information hiding technology because digital video is commonly stored and transmitted in compressed form. The next-generation high-efficiency video coding (HEVC) standard plays an important role in high-definition video applications due to its high encoding efficiency. Consequently, the information hiding method for HEVC has a theoretical value and practical significance. However, existing HEVC information hiding technology exhibits deficiencies, such as the rapid increase in bit rate and the degradation of video quality. Accordingly, a new large-capacity information hiding method for HEVC based on the just noticeable coding distortion (JNCD) model is proposed. The JNCD model is presented as a type of visual perception model for HEVC by considering the blurring and block distortion of the coding process. This model can effectively remove human perception redundancy and achieve higher subjective perceptual quality at the same bit rate. The optimal quantization parameter (QP) value for each coding unit is computed by using this model. An exploiting modification direction algorithm is utilized to adjust the optimal QP values and embed secret information. The algorithm maximizes the modified direction, thereby increasing embedding capacity from the usual four states to five states, in which a maximum of one bit data is changed for the two consecutive coded QP values. The extraction of information can be performed directly without referring to the original video, which can be satisfied with real-time and blind extraction performance. The HEVC reference software HM16.0 is used, and five sequences with different resolutions are tested. After the secret information is embedded, experimental results show that the average PSNR of the video test sequence is 41.16 dB. Unlike existing information hiding methods, our approach does not only maintain good subjective and objective video quality, but also increases information hiding capacity by approximately 2 times, on average. The proposed method can effectively increase embedding capacity, prevent bit rate increases, and ensure the quality of the original video image. A secret key is used to scramble and encrypt the secret information, which increases information security. Only the user who holds the key can decrypt and obtain the secret information at the decoding side. Embedding the secret information into the frame with the highest priority prevents information loss caused by packet loss and dropped frames in the channel transmission. This process also effectively guarantees video perception quality. The proposed method fulfills the invisibility, security, and real-time requirements of information hiding, thereby making it suitable for military organizations, financial institutions, commercial markets, and other secure communication fields.  
      关键词:information hiding;high efficiency video coding (HEVC);just noticeable coding distortion (JNCD);quantization parameter (QP);the exploiting modification direction (EMD) algorithm   
      2633
      |
      391
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110675 false
      更新时间:2024-05-07
    • Wang Ying, Yu Mei, Ying Hongwei, Jiang Gangyi
      Vol. 22, Issue 4, Pages: 452-462(2017) DOI: 10.11834/jig.20170405
      摘要:At present, 3D videos have become extensively integrated into the daily lives of people due to the immersive visual experience that they provide to users. However, viewers can experience visual discomfort when watching 3D videos, and even suffer from eye fatigue, headache, nausea, and other symptoms due to defects in 3D imaging technology. Therefore, the study of visual comfort enhancement methods for stereoscopic images or videos is highly significant to improve stereoscopic display technology and provide users with higher-quality 3D vision service. The factors that can cause visual discomfort when people watch stereoscopic images or videos include the followings: vergence-accommodation conflict, excessive cross and non-cross disparities, disparity distribution, spatial frequency, mismatch between left and right images, and object movement. Vergence-accommodation conflict is the fundamental cause of visual discomfort. Binocular vergence-accommodation conflict is characterized by a large disparity that occurs in 3D space. If the disparity is outside the fusion range, then the viewer cannot fuse the left and right images into a stereoscopic image, and instead, will see an unclear crosstalk image, thereby resulting in severe visual fatigue. Disparity distribution is also one of the main factors that affect visual comfort. Excessive cross disparity is more likely to cause visual discomfort than excessive non-cross disparity. When the entire image is located in front of the screen, visual comfort will be lower compared with when the entire image is positioned behind the screen. The disparity distribution of an image is more concentrated on the zero-disparity plane, thereby making the image more comfortable to view. As dispersion decreases, viewing the image becomes more comfortable. Spatial frequency also influences visual comfort by affecting binocular fusion limit. An image with high spatial frequency causes a higher degree of visual discomfort than an image with low spatial frequency. Disparity adjustment is the main method that can enhance the visual comfort of stereoscopic images because the vergence-accommodation conflict caused by the increased disparity is the main factor that leads to visual discomfort. Disparity adjustment methods can be divided into two categories: disparity shifting and disparity scaling. A disparity shifting method adjusts disparity by shifting the zero-disparity plane of the original image, thereby keeping the disparity range unchanged. Although this method has low computational complexity, simultaneously ensuring maximum cross disparity and non-cross disparity within the comfort zone is difficult regardless of how disparity is moved when the original disparity range exceeds a certain range of comfortable viewing area. Thus, visual discomfort remains unavoidable in this case. By contrast, the disparity range of the original scene can be linearly or nonlinearly adjusted into the comfort area by using a disparity scaling method. In general, excessive vergence-accommodation conflict can be avoided effectively by reducing the disparity range of the scene. However, when a large-scale disparity reduction is performed, the overall perceived depth of the stereoscopic image is significantly decreased, and an unnatural visual effect occurs due to the limited range of the comfortable viewing area. A new visual comfort enhancement method for stereoscopic images is proposed by combining global linear and local nonlinear disparity remapping based on the effect of disparity on visual comfort. This method can prevent visual discomfort when viewing stereoscopic images; it also balances the improvement of visual comfort of stereoscopic images and the weakening of the 3D sense of scenes. First, an objective visual comfort assessment model is constructed to automatically predict the visual comfort of stereoscopic images and to judge the improvement of the visual comfort of stereoscopic images during disparity adjustment. On the one hand, when binocular fusion limitation is considered, the global visual comfort features of stereoscopic images are extracted by combining spatial frequency and disparity. On the other hand, we perform a disparity statistical analysis on stereoscopic significant regions and obtain local visual comfort features based on the hypothesis that the human eye tends to pay excessive attention to perceived salient regions. Support vector regression is adopted in this study to construct the objective visual comfort prediction model for stereoscopic images by establishing the mapping relationship between features and subjective scores. Then, the visual comfort of the input stereoscopic image is analyzed using the constructed prediction model. A two-stage disparity remapping strategy is designed for less-comfortable stereoscopic images. This strategy consists of the global linear adjustment of the disparity range and the local nonlinear adjustment of the disparity in the extracted potentially less-comfortable regions. The global disparity remapping of the input disparity map is performed during the first stage to adjust the uncomfortable stereoscopic images to a relatively comfortable degree. The global disparity linear iterative adjustment process is performed if the predicted visual comfort objective score is less than the preset threshold. Only the global features are applied at this point to construct the visual comfort prediction function. Local nonlinear disparity remapping is then performed during the second stage to further enhance the viewing comfort of the stereoscopic image and maintain the 3D sense of the scene. The disparity of the potentially less-comfortable regions extracted from the disparity map after global linear remapping is adjusted via nonlinear iteration until the predicted visual comfort objective score is higher than the preset target threshold. The visual comfort of the adjusted stereoscopic image is predicted in conjunction with global and local features at this point. Lastly, an updated comfortable stereoscopic image is reconstructed via a rendering technique according to the remapped disparity map. A subjective evaluation experiment is designed on the IVY Lab stereoscopic image database to verify the effectiveness of the proposed method in improving the visual comfort and maintaining the 3D sense of stereoscopic images. Experimental results show that the proposed method can more effectively enhance the visual comfort of less-comfortable stereoscopic images while maintaining the 3D sense of scenes compared with state-of-the-art stereoscopic image visual comfort enhancement methods. The proposed method can automatically implement global linear and local nonlinear disparity remapping processes based on the visual comfort prediction model constructed with different features of stereoscopic images. The proposed method can realize the purpose of improving the visual comfort of stereoscopic images under the premise of ensuring 3D sense, which enhances the overall 3D experience of stereoscopic images.  
      关键词:stereoscopic image;visual comfort enhancement;objective prediction model;disparity remapping;three-dimensional sense   
      2478
      |
      337
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111512 false
      更新时间:2024-05-07
    • Wang Huabin, Lu Cheng, Zhou Jian, Tao Liang, Shi Hanqin
      Vol. 22, Issue 4, Pages: 463-471(2017) DOI: 10.11834/jig.20170406
      摘要:The multiplicative update rule is generally used in non-negative matrix factorization (NMF). Such rule exhibits simple implementation characteristics, and thus, the linear complexity for each iteration is frequently applied because it is more stable than Newton's method. However, the multiplicative update rule also has disadvantages, such as slow convergence, asymptotic convergence to zero, and poor local optima. In particular, when the size of the data to be processed is large, the multiplicative update rule exhibits extremely low timeliness, and thus, it cannot be applied to several real-time applications, such as online object tracking. To address these limitations, an orthogonal projection NMF method based on alternating direction method of multipliers is proposed. The traditional NMF algorithm does not guarantee the availability of good partial-based representation; hence, a non-negative projection matrix and an orthogonal constraint are introduced. An orthogonal projection NMF (OPNMF), which can reduce computation complexity, is applied. Simultaneously, the bases exhibit high orthogonality due to the orthogonal and sparse characteristics of OPNMF. Therefore, obtaining an excellent part-based representation of the object becomes possible. As a new development of the Lagrange augmentation algorithm, the alternative direction method of multipliers (ADMM) is a simple and effective approach to solve the separable convex programming problem (particularly large-scale problems). One of the advantages of ADMM is that it can separate the objective function of the original problem into several sub-problems, which are easier to find local solutions via the separability of original function, and thus, an optimal solution is obtained. Compared with the NMF based multiplicative update rule, OPNMF based on the alternative direction method of multipliers does not only converge to the global solution, but also requires less convergence time. The derivation of the proposed algorithm is presented in this study. First, the solution of the original objective function is decomposed into the alternative optimization of different sub-problems based on the orthogonality and sparsity of OPNMF. The augmented Lagrangian equation of the original objective function is established to equally represent the sub-problems of the original problem by introducing auxiliary variables. Then, the main and dual variables of the transformed equation are optimized alternately. In particular, the partial derivatives of these variables are used for each equation to find the current optimal solution. Finally, the corresponding update rules are derived and the iterative process of each variable is updated alternately to obtain the optimal solution. Four matrices with different sizes are selected for the simulation to compare the performance of the ADMM algorithm with the multiplicative update rule in OPNMF. Experimental results show that the proposed method clearly outperforms the traditional approach in terms of convergence speed and accuracy. In particular, as the size of the matrix increases, the convergence rate of the algorithm becomes considerably higher than that of the multiplicative update rule. In the second experiment, we apply the proposed method to object tracking, which is a classical study area in computer vision. The observation model of the moving object is represented by the bases of OPNMF, i.e., the candidate object in each frame is represented by the linear combination of the basis vectors. During the tracking stage, the observation model is updated on time to avoid tracking drift due to the continuous appearance changes of the target object. A new template-updating strategy based on ADMM, which combines OPNMF part-based representation and ADMM update speed, is also proposed. Compared with the multiplicative update rule and three other state-of-the-art object tracking algorithms, the experimental results show that the proposed method is effective with OPNMF based on the multiplicative updating rule. The tracking speed is approximately 3.8 times that based on the multiplicative updating rule. The overlap rate of the proposed method is approximately 0.73, which is better than those of the other three object tracking algorithms. The proposed method is considerably superior when the size of the matrix is large. The ADMM method can achieve faster convergence rate and higher convergence precision when is appropriately selected. Moreover, object tracking based on OPNMF, which benefits from the sparseness and orthogonality of this algorithm, can address certain interference situations, such as occlusion, scale change, and illumination change, thereby achieving robust tracking.  
      关键词:orthogonal projection nonnegative matrix factorization;alternating direction method of multipliers;multiplicative update rule;particle filter;object tracking   
      2875
      |
      315
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110449 false
      更新时间:2024-05-07
    • Global-local metric learning for person re-identification

      Zhang Jing, Zhao Xu
      Vol. 22, Issue 4, Pages: 472-481(2017) DOI: 10.11834/jig.20170407
      摘要:The task in person re-identification is to match snapshots of people from non-overlapping camera views at different times and places. Intra-class images from different cameras show varying appearances due to variations in illumination, background, occlusion, viewpoint, and pose. Feature representation and metric learning are two major research directions in person re-identification. On the one hand, some studies focus on feature descriptors, which are discriminative for different classes and robust against intra-class variations. On the other hand, numerous metric learning algorithms have achieved good performance in person re-identification. The comparison of all the samples with a single global metric is inappropriate for handling heterogeneous data. Several researchers have proposed local metric learning. However, these methods generally require complicated computations to solve convex optimization problems. To improve the performance of metric learning algorithms and avoid complex computation, this study applies the concept of local metric learning and combines global metric learning algorithms, such as cross-view quadratic discriminant analysis (XQDA) and metric learning by accelerated proximal gradient (MLAPG). In the training stage, all the samples are softly partitioned into several clusters using the Gaussian mixture model (GMM). Local metrics are learned on each cluster using metric learning methods, such as XQDA and MLAPG. Meanwhile, a global metric is also learned for the entire training set. In the testing stage, the posterior probabilities of the testing samples that are aligned to each GMM component are computed. For each pair of samples, the local metrics weighted by their posterior probabilities of GMM components and the global metric weighted by a cross-validated parameter are integrated into the final metric for similarity evaluation. In this manner, we use different metrics to measure various pairs of samples, which is more suitable for heterogeneous data sets. In particular, we also propose an effective local metric learning strategy for MLAPG by modifying the weights of the loss values of the sample pairs in the loss function with the posterior probabilities of the samples aligned to each GMM component. We conduct experiments on three challenging data sets of person re-identification (i.e., VIPeR, PRID 450S, and QMUL GRID). Experimental results show that the proposed approach achieves better performance compared with traditional global metric learning methods. It performs significantly better on the VIPeR data set, providing more complex variations of backgrounds and clothes than on the other data sets, thereby improving matching accuracy by approximately 2.0%. In addition, we also conduct experiments on different types of feature representations for person re-identification to verify the generalized effectiveness of the proposed method. The matching accuracy is improved by approximately 1.3% to 3.4% with different feature descriptors. This result shows that the proposed approach can improve performance regardless of which feature descriptor is used. We propose a novel framework for integrating global and local metric learning methods by taking advantages of both metric learning approaches. Numerous recent global metric learning approaches can be integrated into the proposed framework to obtain improved performance in the person re-identification problem. Compared with certain local metric learning approaches, the proposed framework integrates global metric learning methods flexibly and effectively. It doesn't require complicated computation unlike other local metric learning approaches. Moreover, the proposed metric learning framework can be applied to many feature representation approaches.  
      关键词:person re-identification;metric learning;local metric learning;integrated global-local metric learning;Gaussian mixture model   
      3661
      |
      359
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110992 false
      更新时间:2024-05-07
    • Human action recognition based on privileged information

      Ling Peipei, Qiu Song, Cai Mingming, Xu Wei, Feng Ying
      Vol. 22, Issue 4, Pages: 482-491(2017) DOI: 10.11834/jig.20170408
      摘要:The study of human action recognition is an area with important academic and application values. It is widely applied to the fields of intelligent surveillance, video retrieval, human interaction, live entertainment, virtual reality, and health care. In human learning, a teacher can provide students with information hidden in examples, explanations, comments, and comparisons. However, the information offered by a teacher is seldom applied to the field of human action recognition. This study considers 3D depth features as privileged information to help solve human action recognition problems and to demonstrate the superiority of a new learning paradigm over the classical learning paradigm. This paper reports on the details of the new paradigm and its corresponding algorithms. The human body can be represented as an articulated system with rigid segments connected by joints. Human motion can be regarded as a continuous evolution of the spatial configuration of these rigid segments. With the recent release of depth cameras, an increasing number of studies have extracted the 3D positions of tracked joints to represent human activities, these studies have achieved relatively good performance. However, relative 3D algorithms have numerous application limits resulting from inconvenient equipment and costly computation. The extraction of joints from RGB video sequences is difficult, which limits recognition result. This study applies 3D depth features as privileged information to solve the aforementioned challenge. In particular, we apply a new skeletal representation that explicitly models the 3D geometric relationships among different body parts that use rotations and translations in 3D space in the lie group. We use different algorithms, including motion scale-invariant feature transform, motion boundary histograms, and different combined descriptors, for the basic 2D features to unite privileged information. Privileged information is available in the training stage, but not in the testing stage. Similar to the traditional classification problem, the new algorithm focuses on learning a new classifier, i.e., support vector machine+ (SVM+). The SVM+ algorithm, which considers both privileged and unprivileged information, is highly similar to SVM algorithms in terms of determining solutions in the classical pattern recognition framework. In particular, it finds the optimal separating hyperplane, which incurs a few training errors and exhibits a large margin. However, the SVM+ algorithm is computationally costlier than SVM. This study applies the new algorithm to the field of human activity recognition to provide convenience to the testing set because 3D information is only required in the training set. We evaluate our method in two challenge databases, namely, UTKinect-Action and Florence3D-Action, with three different 2D features. The SVM+ algorithm considers both 2D basic features and 3D privileged information, whereas SVM only uses 2D basic features. Results show that our proposed SVM+ outperforms SVM. Moreover, SVM+ is less sensitive to relevant parameters than SVM. This paper reports on the details of the recognition performance, varying numbers of training samples, different parameters, and confusion matrix for both SVM and SVM+ on the two datasets. The privileged information can help to reduce the noise of the original 2D basic features and increase the robustness of human activity recognition. The role of a teacher in providing remarks, explanations, and analogies is highly important. This study proposes a new human action recognition method based on privileged information. The experimental results of the two datasets show the effectiveness of our method in human action recognition. The proposed method is only required to extract 3D privileged information during the training process. A depth information acquisition device is not required during the testing process. This method exhibits high learning speed and low computational complexity. It can be extensively used in low-cost, real-time human action recognition.  
      关键词:human action recognition;privileged information;support vector machine(SVM);support vector machine+(SVM+);3D lie group features   
      3507
      |
      358
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110960 false
      更新时间:2024-05-07
    • Joint tracking of infrared-visible target via spatiogram representation

      Zhang Canlong, Tang Yanping, Li Zhixin, Wang Zhiwen, Cai Bing
      Vol. 22, Issue 4, Pages: 492-501(2017) DOI: 10.11834/jig.20170409
      摘要:This study proposes a joint spatiogram tracker after considering the issues of real time and accuracy within the tracking system of a multiple sensor. In the proposed method, a second-order spatiogram is used to represent a target, and the similarity between the infrared candidate and its target model, as well as that between the visible candidate and its target model, is integrated into a novel objective function for evaluating target state. A joint target center-shift formula is established by performing a derivation method similar to the mean shift algorithm on the objective function. Finally, the optimal target location is obtained recursively by applying the mean shift procedure. In addition, the adaptive weight adjustment method and the model update method based on a particle filter are designed. We tested the proposed tracker on four publicly available data sets. These data sets involved general tracking difficulties, such as the absence of light at night; shade, cluster, and overlap among targets; and occlusion. We also compared our method with joint histogram tracking (JHT, the degenerated version of our method) and state-of-the-art algorithms, such as the L tracker (L1T) and the fuzzified region dynamic fusion tracker (FRD), on more than four infrared-visible image sequences. For the quantitative comparison, we use four evaluation metrics, namely, the average center offset error, the average overlap ratio, the average success rate, and the average calculation time. The corresponding test results of each algorithm in the four data sets are as follows: proposed method (6.664, 0.702, 0.921, 0.009), L1T track infrared target(25.53, 0.583, 0.742, 0.363), L1T track visible target(31.21, 0.359, 0.459, 0.293), FRD (10.73, 0.567, 0.702, 0.565), and JHT(15.07, 0.622, 0.821, 0.001). In terms of overlap ratio, the average precision of our method is approximately 23%, 14%, and 8% higher than those of L1T, FRD, and JHT, respectively. In terms of success ratio, the average value of our method is approximately 32%, 46%, and 10% higher than the corresponding trackers. The proposed fusion tracker is superior to a single-source tracker in addressing cluttered background, light change, and spatial information retention. It is suitable for tracking targets in certain situations, such as when light is absent at night; shade, cluster, and overlap among targets; and occlusion. The method runs at a rate of 30 frame/s, thereby allowing simultaneous tracking of up to four targets in real time.  
      关键词:joint tracking;infrared;visible;spatiogram;particle filter   
      2823
      |
      343
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111348 false
      更新时间:2024-05-07
    • Anti-interference contour tracking under prior model constraint

      Liu Daqian, Liu Wanjun, Fei Bowen
      Vol. 22, Issue 4, Pages: 502-515(2017) DOI: 10.11834/jig.20170410
      摘要:Target tracking plays an important role in computer vision, which is widely applied in intelligent traffic, robot vision, and motion capture. Experts and scholars have proposed numerous excellent target tracking algorithms in recent years to avoid the influence of illumination changes, target deformation, partial occlusion (even global occlusion), complex background, and other factors. One of the popular topics in the field of target tracking is determining how to deal with the change in target contour. A level set can better optimize the topology structure of a target, and thus, many researchers have adopted the level set method for the contour extraction and tracking of targets. In 2004, Freedman used the Bhattacharyya distance and Zhang used the Kullback-Leibler distance in 2005, respectively, to determine the target layout and locate the best candidate region. Accordingly, these researchers combined foreground/background matching flow and proposed a combined flow method. However, these two algorithms depend on the initial target selection. When the initial contour differs from the actual contour of the object, the algorithms will require multiple iterations to converge. Chiverton proposed an online contour tracking algorithm based on the learning model. This algorithm establishes a prior target model through initial target morphology and constrains the contour tracking process by using the target model. Ning proposed an approach that applied the morphological information of the initial delineation of the target to establish the prior model. This researcher also adopted the level set method for the implicit representation of the foreground and background regions of the target information. The distribution area of the foreground/background target is determined using the Bhattacharyya similarity measure to realize accurate tracking. Rathi adopted the geometric active contour model to track a deformed target that was moving fast in the framework of a particle filter. The algorithm does not only achieve affine transformation, but can also accurately estimate the non-affine transform target. The contour extraction methods based on a level set are extensively applied to tracking moving targets. Traditional methods can be easily affected by the local occlusion of other targets and the complex background. A novel tracking approach based on anti-interference contour tracking under the prior model constraint is proposed to solve the aforementioned problems. The proposed approach uses a simple model matching algorithm to track the previous five frames of the image sequences. The training sample set is established based on several super pixel blocks obtained via super pixel segmentation. The super pixel block sets with the same color feature are used to establish the cluster sets by using the mean shift algorithm. The confidence probability of each cluster is then calculated, and a prior model of the target is constructed according to the confidence degree of clusters. Subsequently, the target contour is extracted using the segmentation method of the level set. This study proposes a novel decision-making method to avoid the influences of partial occlusion and complex background. This method determines whether a shape prior model is required to constrain the level set evolution process, and thus, obtain more robust tracking results. Lastly, an appearance model online-updating algorithm is proposed. This algorithm can append the appropriate feature compensations to feature sets to improve the updating accuracy of the appearance model. The algorithm uses the evolution results of shape and color features, and then, the feature loss and redundant feature problems are effectively solved when the target is occluded. Six sets of common video sequences are used in the test to verify the performance of the proposed algorithm. The video sequence covers challenging factors, such as illumination change, partial occlusion, target deformation, and complex background. The algorithm is also compared with available contour tracking algorithms, such as the density matching and level set, the learning distribution metric, joint registration, and active contour segmentation. The proposed contour tracking algorithm can achieve the same or even higher tracking accuracy compared with excellent contour tracking algorithms. The average center errors in the video sequences Fish, Face1, Face2, Shop, Train, and Lemming are 3.46, 7.16, 3.82, 13.42, 14.72, and 12.47, respectively. The tracking overlap ratios of the aforementioned video sequences are 0.92, 0.74, 0.85, 0.77, 0.73, and 0.82, respectively. The average running speeds in the aforementioned video sequences are 4.27 frame/s, 4.03 frame/s, 3.11 frame/s, 2.94 frame/s, 2.16 frame/s, and 1.71 frame/s, respectively. Experiment results indicate that using the prior model constraint of the target and implementing decision-making in the contour extraction process provide the algorithm with accurate tracking and strong adaptability characteristics under the conditions of partial occlusion, target deformation, target rotation, and complex background. The characteristics of the proposed approach are as follows:1) a prior model of the target is built by training the sample set, removing the interference of the non-target information in the image, and providing the prior model with a more accurate description of the target;2) a decision-making method is proposed to judge whether a prior model is required. If the constraints of a prior model must be introduced, then the results of the shape subspace and the evolution in color space are fused in the level set segmentation process;3) an appearance model online-updating algorithm is proposed, which can append the appropriate feature compensations to the feature sets, thereby ensuring the accuracy of the model.  
      关键词:prior model;level set;decision-making;feature compensation;contour tracking   
      2384
      |
      363
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111435 false
      更新时间:2024-05-07
    • Deconvolutional neural network for prostate MRI segmentation

      Zhan Shu, Liang Zhicheng, Xie Dongdong
      Vol. 22, Issue 4, Pages: 516-522(2017) DOI: 10.11834/jig.20170411
      摘要:Prostate cancer is one of the leading causes of deaths due to cancer among older men, and its diagnosis experiences many challenging problems. Imaging-based prostate cancer screening, such as magnetic resonance imaging (MRI), requires an experienced medical professional to extensively review the obtained data and perform a diagnosis. The first step in prostate radiation therapy is to identify the difference between the original image and the nearby prostate tissue. However, prostate MRI results face the problems of low organizational boundary contrast ratio and lack of effective areas. Manual segmentation will take considerable time, which cannot meet clinical real-time requirements. Although several methods presented in the MICCAI 2012 challenge achieved reasonable results, they highly depended on feature selection or statistical shape modeling performance, and thus, presented limited success. A segmentation algorithm for prostate MRI based on a deep deconvolutional neural network is proposed to solve the aforementioned deficiencies. Inspired by the latest deep learning technology, fully convolutional network, and DeconvNet, we present a multi-layer deconvolutional convolutional network to demonstrate that a deep neural network can dramatically increase the automated segmentation of prostate MRI images compared with systems based on handcrafted features. The deep neural network model exhibits strong feature learning and end-to-end training capacities, which provide better performance than former image processing techniques. Unlike an image classification task, each pixel in an MRI image is regarded as an object that should be classified. Hence, we obtain the final segmentation results by considering the prediction of prostate tissues as a two-stage classification task. This study presents a multi-layer convolutional network, which utilizes a convolution filter, a pooling layer, and a decoder network to transform an input MRI image into a probability map. A convolutional neural network is used in the training process of this model to extract highly distinct image features. Then, a deconvolution strategy is adopted to expand the feature map size and to maintain the sizes of the input image and the output probability map. The stacked convolution and deconvolution layers can maintain resolution size by adding a pad to the input image. In addition to achieving deeper network architecture, the stacked convolution layers exhibit strong robustness against overfitting. Finally, the probability map is used to train a softmax classifier and the final segmentation result is obtained. We replace the classical neuron activation function with a rectified linear unit in our model to speed up the training process and avoid the vanishing gradient. The Dice similarity coefficient is used as the loss function in our convolutional network to overcome the problem of low effective organization in the original image. The images provided in MICCAI 2012 exhibit varying sizes and resolutions, and thus, we preprocess the images and augment the size of the data set via multi-scale cropping and scale transformation to improve training reliability. All the experiments are performed on the MICCAI 2012 data set. The algorithm proposed in this study uses the Dice similarity coefficient and Hausdorff distance as evaluation metrics. The Dice similarity coefficient is over 89.75%, whereas Hausdorff distance is shorter than 1.3 mm, which can realize the segmentation accuracy of traditional methods. Furthermore, the processing time is shortened to within 1 min, which is clearly superior to those of other methods. The deep learning approach is gradually being applied to the medical field. This study introduces a new deep learning method that is used to segment prostate images. Both the qualitative and quantitative experiments show that the prostate segmenting method based on the deconvolutional neural network can segment MRI images accurately. The proposed method can attain higher segmentation accuracy than the traditional methods. All the calculations are performed on a graphics processing unit, and handling time is considerably shortened compared with those of other segmentation algorithms. Therefore, the proposed model is highly appropriate for the clinical segmentation of prostate images.  
      关键词:prostate segmentation;magnetic resonance imaging;convolutional neural network;Dice similarity coefficient;Hausdorff distance   
      5826
      |
      349
      |
      9
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110780 false
      更新时间:2024-05-07
    • Hierarchical skull registration method with an iterative factor

      Zhu Lipin, Liu Xiaoning, Liu Xiongle, Lu Yanning
      Vol. 22, Issue 4, Pages: 523-531(2017) DOI: 10.11834/jig.20170412
      摘要:Restore the appearance of an unknownskull during a craniofacial reconstruction procedure based on knowledge, the most similar skull should be retrieved from the database. Then, the reference face can be found according to the most similar skull. The process of searching for the most similar skull is called skull registration. In the skull registration problem, accuracy and efficiency are the two important performance targets that cannot be disregraded. A novel method for skull registration based on feature region extraction and the modified iterative closest point (ICP) algorithm is proposed in this study. The novel method is called the hierarchical skull registration method with an iterative factor. First, the skull model is denoised, simplified, and normalized. Then, the convexity or non-convexity of each point in the point cloud model is determined using the method based on an integral invariant. The surface of the skull can be divided into concave or convex feature regions via the -means clustering method. The similarity of the concave or convex regions between two different skulls is subsequently calculated by comparing their principal components and areas. Optimal matching is conducted through exhaustive search. The optimal 3D transform for each potential pair of matched feature regions approximately aligns the skull surface. Lastly, the novel ICP algorithm with an iteration factor is applied to achieve fine registration. The presented method is applied to the skull registration of a Terracotta Army model and a public data set. The registration time of the classical ICP algorithm is 6.23 s, 7.61 s, and 4.17 s, whereas that of the improved ICP algorithm is 3.02 s, 3.23 s,2.83 s, respectively. Registration efficiency is doubled and accuracyis also significantly improved. However, the iteration factor varies for different data sets. Experimental results show that the proposed algorithm can achieve better registration accuracy and higher iterative convergence speed in the fine registration stage.The entire process is completed without human intervention and demonstrates certain adaptability. It can be used for similar 3D model registrations.  
      关键词:skull registration;feature regions;integral invariant;principal component analysis;modified iterative closest point(ICP)algorithm;iteration factor   
      2265
      |
      311
      |
      3
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111275 false
      更新时间:2024-05-07
    • Wu Zhe, Zeng Jiexian, Gao Qiqi
      Vol. 22, Issue 4, Pages: 532-541(2017) DOI: 10.11834/jig.20170413
      摘要:The detection and recognition of aircraft targets in remote sensing images has recently become a popular research issue both local and abroad. In traditional methods, invariant features are extracted from segmented targets to train learning machines. In cases with less interference, traditional methods work effectively. In practice, however, several interfering factors, including non-uniform/unstable illumination, complex background, and noise, contaminate the quality of remote sensing images. Thus, traditional methods are time-consuming and cannot achieve high recognition accuracy. To recognize an aircraft target in a remote sensing image rapidly and accurately, this study proposes an aircraft target recognition algorithm based on saliency images as well as global and local features. Visually salient targets in a remote sensing image are extracted using the modified Itti algorithm. A region increasing algorithm and a line marked algorithm are both applied to search connected regions, Accordingly, the number and location of candidate targets can be determined. Multi-scale autoconvolution (MSA), pseudo-Zernike moment, and Harris-Laplace feature are extracted, and their stability is evaluated based the ratio of the standard deviation to the mean. The selected features are combined to constitute a feature vector. Lastly, the candidate targets are recognized via support vector machine (SVM). Experimental results show that the detection and recognition accuracies of the proposed algorithm are 97.2% and 94.9%, respectively, which are both higher than the values for existing methods. In addition, the proposed algorithm has other advantages, including low time consumption, low false alarm rate (0.03), strong robustness against noise, complex background and affine transformation. This study proposes a new aircraft target recognition algorithm based on saliency image as well as global and local features. The combination of the employed features, including MSA, Pseudo-Zernike moment and Harris-Laplace feature, provide more discriminant information than single feature. The novel algorithm improves both recognition efficiency and anti-interference ability.  
      关键词:aircraft targets recognition;remote sensing image;saliency image;multi-scale autoconvolution(MSA);invariant moments;Harris-Laplace   
      3458
      |
      350
      |
      9
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111064 false
      更新时间:2024-05-07
    • Peng Qian, Zhang Bing, Sun Xu, Gao Lianru, Yu Wenbo
      Vol. 22, Issue 4, Pages: 542-550(2017) DOI: 10.11834/jig.20170414
      摘要:The problem of mixed pixels is common in hyperspectral remote sensing image processing and analysis, and the non-negative matrix factorization (NMF) method has been introduced into hyperspectral unmixing. This study proposes a new spatial and spectral preprocessing technique for constrained NMF, thereby making the unmixing process robust to noise. The spatial and spectral preprocessing technique is based on constrained NMF, such as minimum volume constrained NMF and graph-regularized NMF. The unmixing result of constrained NMF is improved by obtaining better endmember candidates from the preprocessing within the spatial and spectral information of the neighborhood. Spatial preprocessing and spatial-spectral preprocessing can both improve the unmixing accuracy of constrained NMF, such as minimum volume constrained NMF and graph-regularized NMF, in five groups of simulation data with different signal-to-noise ratios (SNRs). Spatial preprocessing can enhance the results of constrained NMF in all SNRs, whereas spatial-spectral preprocessing effectively optimizes accuracy, particularly for conditions with low SNR. Real data experiment based on the well-known hyperspectral data captured by the Airborne Visible/Infrared Imaging Spectrometer over Cuprite, Nevada is conducted. Spatial preprocessing improves the unmixing results of constrained NMF in the real data set, and spatial-spectral preprocessing is better than spatial preprocessing. The experiments on both simulated data and real data show that spatial and spectral preprocessing can efficiently improve the unmixing accuracy of constrained NMF, particularly in low SNR condition. This finding indicates robustness to noise with spatial and spectral information. This study introduces spatial and spectral preprocessing into constrained NMF. The unmixing process becomes more robust to noise with constrained NMF by optimizing the endmember candidate using the spatial and spectral information of hyperspectral remote sensing data in complex remote sensing scenes.  
      关键词:hyperspectral image;pixel unmixing;non-negative matrix factorization;spatial and spectral preprocessing;spectral mixture analysis   
      2578
      |
      408
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110561 false
      更新时间:2024-05-07
    • Ma Jingzhen, Sun Qun, Xiao Qiang, Zhao Guocheng, Zhou Zhao
      Vol. 22, Issue 4, Pages: 551-562(2017) DOI: 10.11834/jig.20170415
      摘要:Global land cover and its change are indispensable basic information for environmental change research, detection of national geographical conditions, and sustainable development. In 2014, the National Geomatics Center of China produced GlobeLand30, a remote sensing mapping product with the highest resolution (30 m) in the world. This data set exhibits high resolution and high accuracy, and thus, it can satisfy the cartographic requirement of 1∶250 000 and other smaller measuring scales as well as provide global data production and updating with significant data resources. When the difference between GlobeLand30 and vector data is considered, accurately and rapidly recognizing identical entities from these two types of data is highly important to update information as well as integrate and fuse multisource and multiscale spatial data. To overcome the shortcomings of the method for recognizing and matching GlobeLand30 and vector data, this study proposes a complex function based on a multilevel arc-height. Fourier shape descriptors can be obtained to measure shape similarity among area entities, and a comprehensive similarity measurement model can be established by integrating the location, size, direction, and shape of area entities. This study uses the proposed model to recognize and match GlobeLand30 and vector data as well as to measure the similarity of Globeland30 before and after simplification and smoothing. This method constructs a complex function based on multilevel chord length to describe the entire and detailed features of area entities by using the characteristics of their border, such as arc-height and central distance. After resampling the border of area entities, a shape descriptor with an independent and compact initial point on the border can be established through fast Fourier transform, which exhibits invariant properties in terms of translation, rotation, and scaling to measure shape similarity and diversity among area entities. Lastly, a comprehensive similarity measuring model is established by integrating the location, size, direction, and shape of area entities. GlobeLand30 and vector data are processed, and the similarity of the entities of these two types of data is calculated through the comprehensive similarity measuring model. Then, the specific entity is determined according to the set comprehensive spatial similarity threshold value. The rule for maintaining similarity by applying the proposed comprehensive similarity measuring model is discussed to measure the shape similarity and comprehensive similarity of Globeland30 data before and after applying different simplification and smoothing algorithms. This study selects the water data of Globeland30 (2010) as example and uses the proposed model to match them with another vector data after vectorization. Experimental results obtained a precision ratio of 100% and a recall ratio of 97.01%. The experiment, which is conducted to compare the method proposed in this study with others, shows that the method for describing tortuosity can only describe the entire, but not the detailed features. The description of similarity via central distance instead of Fourier shape descriptors increases the difference in similarity, which will result in omitting matching or other mistakes. The discussion regarding the effect of different arc-height levels proves that both matching precision ratio and recall ratio reach their maximum values when is set to 4 or 8. Computation complexity is positively related to the value of , and thus, matching speed will be lower if is set higher. Moreover, setting to 4 is better to achieve satisfying efficiency and accuracy. The point_remove or bend_simplify algorithms are applied to simplify different threshold values, whereas the peak or Bezier algorithm was selected for smoothing. The similarity measuring method is then applied to Globeland30 data before and after simplification and smoothing. From the results, we discuss the relation among similarity levels and threshold values. The findings of the experiment show that the two simplification algorithms can maintain approximately the same similarity when threshold values vary within a small range. However, variations outside the specific range will result in an evident difference that is reflected in the sharp corner phenomenon simplified by the point_remove algorithm. For the two smoothing algorithms, Bezier provides only one result, whereas the processing results of peak vary with different threshold values. 1) This method constructs the multilevel chord length complex function to describe the entire and detailed features of area entities by using the characteristics of their border, such as multilevel chord length, arc-height, and central distance. This complex function satisfies the demand for multilevel shape description by changing arc-height level, including describing the entire and detailed features of area entities. The Fourier transform of the multilevel chord length complex function solves the inconsistencies in border and initial numbers of points, thereby meeting the demand for invariance in terms of translation, rotation, and scaling. 2) A comprehensive similarity measuring model is established based on the multilevel arc-height Fourier shape description method by integrating the location, size, direction, and shape of area entities. Experiments prove that this model works efficiently when matching two types of data. This study discusses the rule of maintaining similarity through different simplification and smoothing algorithms and by applying the similarity measuring method to evaluate the similarity of Globeland30 data before and after simplification and smoothing. The study achieves good results. It uses the water data as example and focuses on matching the entities of Globeland30 with another vector data by applying the comprehensive similarity measuring model. Further studies will focus more on the problem of matching Globeland30 data with other vector data and applying the results to produce and update vector data, particularly of overseas regions worldwide.  
      关键词:Globeland30;complex function of multilevel arc-height;fourier description;similarity measuring;area matching   
      2622
      |
      321
      |
      5
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110853 false
      更新时间:2024-05-07
    0