最新刊期

    22 12 2017
    • Review of Lattice Boltzmann method for image processing

      Liu Yingqian, Yan Zhuangzhi
      Vol. 22, Issue 12, Pages: 1623-1639(2017) DOI: 10.11834/jig.170193
      摘要:Currently, images, such as 3D medical and high-resolution satellite images, provide considerable information, and image processing results are required in real time in many cases, such as clinical and meteorological. Parallel image processing devices, such as graphics processing unit (GPU) and field-programmable gate arrays, have been created for engineers at a convenient price. Partial differential equation (PDE) method is extensively used in image processing. However, its solution methods are time-consuming and difficult to be directly mapped to GPU. Traditional PDE solution methods are contradictory to the assumption that space and time are continuous. Thus, a method that is naturally parallel and simple and with clear physical meaning is required to simulate the macro model described by the PDE. Recently, lattice Boltzmann (LB) method has been applied to image denoising, inpainting, registration, and segmentation as an efficient and flexible method for modeling and solving PDE. However, a systematic review of the applications of LB for image processing has not been found in previous studies. Therefore, this paper proposes the abovementioned literature review to support scholars in gaining further insights into the frontier development of the topic. In this work, numerous public reports on the applications of LB for image denoising, inpainting, segmentation, and other 3D image processing were initially surveyed using the keywords, "lattice Boltzmann" and "image processing." These reports were classified according to their differences when scholars proposed LB mathematical models, namely, "top-down" or "bottom-up" approaches in terms of macro or micro, respectively. Then, programming algorithms, computing complexities, and application scenarios of LB for image processing were analyzed and summarized. Finally, essential differences between LB and other PDE-solving methods were concluded, and further research directions on this topic were proposed. First, LB model has a clear physical meaning. The general LB method consists of two steps as follows:a streaming step in which particles (or particle densities) move from node to node on a lattice and a collision step in which particles (or particle densities) are redistributed at each node. The two steps are governed by the LB evolution equation, where parameters relaxation time and source term decide the movement of the particles. The state of each node at the next moment is only related to the state of its neighboring nodes because the particles move along the links. In image processing, each pixel value is considered particle densities, and changes in pixel value can be considered redistribution of particles that are decided by relaxation time and source term in which image information, such as gradient and curvature, are embedded. Second, macro models can be classified into anisotropic, nonlinear, and linear diffusion models according to diffusion tensor. The microdifferences among the abovementioned macro models are decided by relaxation time. In the anisotropic diffusion model, has a different value on the links, and the value changes along each link according to image information. In the nonlinear diffusion model, has the same value on the links, and the value changes similarly to the anisotropic model. In the linear model, has the same and constant value on the links. changes in the anisotropic and nonlinear models because the pixel value changes after each iteration; changes in the model with external force terms. Two parameters must be computed in each iteration. Consequently, the computing complexities of the anisotropic, nonlinear, and linear diffusion models are decreased. The computing complexity of the LB models that include external force terms is also decided by source terms. Third, "top-down" approach uses the LB evolution equation to construct a macro model, which appears similar to existing image processing PDE. Then, and are determined by PDE diffusion tensor and external force term separately. "Bottom-up" approach constructs and directly according to the physical meaning of the LB method. The first approach uses the LB method as an alternative solving approach to PDE and requires high mathematical skill. The second approach used in constructing the mathematical model is easy and flexible. Fourth, the LB method is inherently parallel and naturally suited for GPU, which is ideally fit for explicit, local, and lattice-based computations. The advantage of LB in computing speed is obvious when image amount is large. The GPU/CPU speedup factors are larger in large 3D volumes than in small volumes. The programming of LB is simple. The core LB algorithm can be implemented with a few lines of codes and a brief coding time. Fifth, the anisotropic and nonlinear diffusion models can be used in image denoising and inpainting. The design of the external force terms significantly influences the quality of image segmentation. The LB method has a high research value as a natural parallel algorithm in fast image processing, such as 3D image denoising, inpainting, and segmentation. However, several problems must still be further studied, such as image boundary processing, parallel platform selection, and optimization.  
      关键词:image processing;Lattice Boltzmann (LB) method;image diffusion LB model;parallel algorithm;time complexity   
      4454
      |
      594
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110408 false
      更新时间:2024-05-08
    • Review of surface defect detection based on machine vision

      Tang Bo, Kong Jianyi, Wu Shiqian
      Vol. 22, Issue 12, Pages: 1640-1663(2017) DOI: 10.11834/jig.160623
      摘要:Surface defects of industrial products exert adverse effects on appearance, comfort, and service performance, and enterprises detect these surface defects of products to control them in time. The manual detection method is the traditional way of surface defect detection and is characterized by low sampling rate, accuracy, and efficiency, poor real-time performance, high labor intensity, and sensitivity to artificial experience. The detection method based on machine vision can significantly overcome these disadvantages by manual detection. Machine vision detection method can find a few problems existing in the production process on the basis of the detection results to eliminate or reduce product defects, prevent potential trade disputes, and maintain enterprise honor. The detection method based on machine vision presents many achievements and applications in metal, paper printing, textile, ceramic tile, glass, and wood surface defect detection at home and abroad. The research and application of surface defect detection based on machine vision are reviewed on the basis of extensive research and the development results of relevant literature. The basic structure and working principle of a typical surface defect detection system based on machine vision are analyzed, and the research status and existing visual software and hardware platforms of surface defect detection based on machine vision are introduced. The relevant research of theory and image algorithm for preprocessing, segmentation, feature extraction and optimization, and image recognition are summarized. The main difficulties and development of visual detection of surface defects are presented, and the development trend in this field is concluded. The surface defect detection system based on machine vision includes the following modules:image acquisition, image processing, image analysis, data management, and man-machine interface. The image acquisition module mainly consists of charge coupled device(CCD)cameras, optical lenses, and light sources. The image processing module mainly involves image denoising, image enhancement and restoration, defect detection, and object segmentation. The image analysis module is mainly concerned with feature extraction, feature selection, and image recognition. The data management and man-machine interface module can display the defect type, position, shape, and size and can carry out image storage, query, and statistics. Image preprocessing aims to reduce noise and improve the quality of images and usually includes spatial and frequency domain methods. In recent years, mathematical morphology and wavelet methods are used in image denoising and obtain good results. Image segmentation means dividing an image into several non-overlapping regions; each region possesses the same or similar certain properties or characteristics, but the image features between different regions present obvious difference. Existing image segmentation methods are mainly divided into threshold-based, region-based, and edge-based segmentations and specific theory methods. At present, new theories and methods of other disciplines have been used in image segmentation. Image feature extraction is the mapping from a high-dimensional image space to a low-dimensional feature space. Image features can be divided into physical, structural, and mathematical characteristics. A method that uses machine to simulate human eye and nervous system as well as physical and structural features does not exist; hence, mathematical characteristics are used to describe image features in digital image processing. The commonly used image features at present are mainly textural features, color features, and shape features. If the feature dimension of the extracted image is too high, then redundant information will exist in the extracted feature, thereby not only increasing the processing time but also decreasing the accuracy of image processing. The correlation among the feature dimensions of the extracted images can be decreased by decreasing the feature dimension with feature selection or optimization as the processing method. Feature selection method mainly includes principal component analysis, independent component analysis, self-organizing map, genetic algorithm, and Fisher, correlation analysis, relief, Tabu search, and nonlinear dimensionality reduction methods. Theory for guiding the selection and optimization of features is unavailable to date. Statistical and syntactic pattern recognitions are two basic pattern recognition methods, and artificial neural networks and support vector machines are the most widely used statistical pattern recognition methods. Surface defect detection based on machine vision will be the main direction in the future. Theoretical research and practical application of surface defect detection based on machine vision have obtained encouraging results to date, but some problems and difficulties remain to be solved. Image processing and analysis algorithm, which include image preprocessing, segmentation of defect regions, feature extraction and selection, and defect recognition and classification, are important concepts. Many algorithms have appeared in each processing flow, and each of which possesses its advantages and disadvantages and range of adaptation. Researchers have focused mostly on improving the signal-to-noise ratio, accuracy, efficiency, real-time performance, and robustness of the detection system. Simulating the information processing function of the human brain to construct an intelligent machine vision system still needs further theoretical research. Surface quality inspection based on machine vision has been attracting much attention and application in modern automatic production. The surface of machine vision detection is complex and involves many disciplines and theories. Machine vision is the simulation of human vision, but the visual mechanism of humans remains unclear. Expressing the visual process of humans by computer is difficult. Therefore, the construction of machine vision inspection system should be further improved through research of biological vision mechanism. Accordingly, the detection will further develop to the direction of automation and intelligence.  
      关键词:machine vision;surface defect;detection algorithm;image processing;images recognition   
      25047
      |
      1531
      |
      70
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110823 false
      更新时间:2024-05-08
    • Li Rong, Li Xiangyang
      Vol. 22, Issue 12, Pages: 1664-1676(2017) DOI: 10.11834/jig.170101
      摘要:Reversible image data hiding refers to hiding information in an image. The information receiver not only can extract complete secret information correctly but also can restore the original image without loss. Reversible image data hiding is widely used in various fields, such as covert communication, medical image processing, copyright protection, and remote sensing technology. At present, many methods are used to implement image data hiding, including reversible data hiding method based on lossless compression, reversible data hiding method based on integer transform, reversible data hiding method based on histogram translation, and reversible data hiding method based on prediction error expansion. The reversible data hiding algorithm based on pixel value ordering (PVO) is widely regarded and improved constantly because of its high fidelity superiority. In this study, an improved PVO reversible data hiding algorithm based on the idea of image block selection is proposed to improve the embedding performance of PVO algorithm and improve the peak value signal-to-noise ratio (PSNR) of stego-image. The PVO method prosed by Li et al. is performed using the following procedures. First, the cover image is divided into non-overlapped groups. Second, the pixel values are sorted in an ascending order for each group. Third, the maximum pixel value is predicted by the sub-maximum pixel value to obtain the maximum prediction error for the pixel values after sorting in each group; the minimum pixel value is predicted by the sub-minimum pixel value to obtain the minimum prediction error. Finally, if prediction errors are equal to 1, then the pixels are used to carry the secret data. If prediction errors are greater than 1, then the pixels are shifted to create vacancy; otherwise, prediction errors are discarded in data embedding. The PVO algorithm usually uses the prediction errors equal to 1 for information hiding and thus exhibits good utilization and concealment capability for smooth image region. Otherwise, algorithm performance is obviously decreased and the performance of PVO algorithm is closely related to pixel distribution. Given that the distribution of pixels in different regions is non-uniform, if the image is divided equally into several regions, the embedding capacity used in the PVO algorithm in different regions differs. When the same amount of information is embedded in each partition, PSNR values also differ. Thus, this study proposes the idea of image block selection to fully utilize embedded space and improve image embedding performance. First, the original image is divided into several non-overlapping block areas, and the block area is selected on the basis of the order of the shift rate from small to large. Second, the appropriate embedding prediction error is selected in each block area. Finally, information embedding is carried out using the original PVO method on the basis of the block selection order and the optimal embedding difference of each block. First, images divided into 8×8 blocks are used in comparing the improved algorithm with the original PVO algorithm. When the embedding amount is 1×10 bit, the shift rate of the Elaine image is reduced from 81.59% to 74.40%, the PSNR is increased from 55.388 2 to 56.996 9, the growth is 1.608 7, the PSNR value of the Aerial image is improved by 1.88, and the PSNR value of the Baboon image is improved by 2.29. The PSNR value of each image shows various degrees of improvement when the experiments are performed using different embedding amounts. Second, images divided into different number blocks, such as 2×2, 4×4, 8×8, or 16×16, are used in comparing the PSNR with the original PVO algorithm. The PSNR value of the Lena image is gradually increased from 59.204 6 to 60.846 9 when the embedding amount is 1×10 bit, and the PSNR values of other images are increased with the increase in the number of blocks. Finally, the maximum embedding amount of the algorithm is counted in the case of different partitions. The maximum embedding amount of the original PVO algorithm is 14 972 bit, the maximum embedding is 14 992 bit when the image is divided into 4×4 blocks, and the maximum embedding is increased to 15 753 bit when the image is divided into 16×16 blocks. The maximum embedding of the image is improved with the increase in the number of blocks. In this study, an improved PVO algorithm based on image block selection is proposed to increase the use of embedded space according to the distribution of pixels. The improved PVO algorithm can obtain high PSNR values and improve the visual experience of secret image in the same embedding amount by adopting image block selection and embedding prediction error optimization strategy in the process of information hiding. Within a certain number of blocks, the number of blocks shows positive correlation with the PSNR value of image. As the number of blocks increases, the PSNR value of image also increases. The method improves the embedding capacity to a certain extent and compensates for the increase in auxiliary information caused by the increase in the number of partitions. The algorithm exhibits a certain improvement in embedding capacity and image fidelity compared with the original PVO algorithm. Subsequent research will focus on the partition rules and optimization issues to further improve the performance of PVO-based algorithm.  
      关键词:pixel value ordering;block selection;embedding prediction-error;reversible data hiding;shift pixels rate;peak value signal-to-noise ratio   
      3914
      |
      410
      |
      4
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110762 false
      更新时间:2024-05-08
    • Zhu Yangang, Zhang Guimei
      Vol. 22, Issue 12, Pages: 1677-1689(2017) DOI: 10.11834/jig.170198
      摘要:The total variation (TV) denoising model is unideal for maintaining the weak edge and weak textural detail of images, even though this model has satisfied the noise reduction performance. Adaptive fractional TV (AFTV) algorithm was presented to identify the texture and non-texture areas in an image based on local image information. The soft threshold value in an adaptive method was also calculated. Thus, the weak edges and weak texture details in the noise image can be preserved substantially better than the traditional TV algorithm could. However, the retaining effect of the weak edge and weak texture detail deteriorates as the noise increases, thereby substantially causing an evident staircase effect. To address this problem, the current study proposes a novel fractional TV denoising algorithm by applying fractional differential theory combined with the TV and characteristics of the residual image. A fractional-order TV model is proposed to substitute for the original first-order TV model. The image is divided into texture and flat areas based on the accurate local variance of the residual image, thereby enabling the adaptive selection of the fidelity item parameter to become considerably reasonable. Consequently, the denoising performance is improved, particularly in processing images with high noise. To verify the denoising effect and texture-preserving capability, experiments on images with three different noise levels are performed using the proposed algorithm and with the TV and AFTV models. In these experiments, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are adopted as evaluation indexes. For example, Gaussian noise with a standard deviation of 30 is added in the experiment on the classic image of the cameraman, which is acquired from a standard image library. This experiment shows that the distribution of the texture detail and noise area in the residual image is disorganized and inaccurate by using the AFTV algorithm but more reasonable and refined using the proposed model. Additional experiments are conducted using standard and nature images, including Lena, Barbara, and Rock, which are contaminated by Gaussian noise, salt-and-pepper noise, and Poisson noise with different noise levels. On the one hand, for the Gaussian-noise image, the proposed algorithm adds the Gaussian noise with a series of standard deviations of 10, 20, 30, 40, and 50 to the test images. Experimental results show that the PSNR value obtained by the proposed algorithm increases by 2.72 dB more than that of the TV model and by 1.38 dB more than that of the AFTV model. Simultaneously, the SSIM value obtained by the proposed algorithm is 0.047 more than that of the TV model and 0.020 more than that of the AFTV model. On the other hand, the salt-and-pepper noise with densities of 10%, 15%, 20%, 25%, and 30% are added to the test images. The experimental results demonstrate that PSNR obtained by the proposed algorithm combined with the median pre-filtering algorithm is 1.308 dB more than the PSNR value under the traditional median filtering algorithm. Simultaneously, SSIM of the proposed algorithm with the median pre-filtering algorithm is 0.011 higher than that of the traditional median filtering algorithm. For the Poisson noise, PSNR and SSIM of the proposed algorithm approximate those of the AFTV algorithm. However, PSNR and SSIM of the proposed algorithm are 1.59 dB and 0.005, respectively, which are higher than those of the TV algorithm. Experiments were performed on images with three different noise levels. The corresponding PSNR and SSIM values were calculated for comparison. PSNR, SSIM, and image visual effect of the proposed method are superior to those of the TV and AFTV models, particularly for the images polluted by large noise. Meanwhile, the time consumed by the TV model is low but the denoising effect is relatively poor. By contrast, the time consumption for the proposed algorithm is nearly the same as that of the AFTV algorithm. However, the denoising effect is good when the proposed model was used. Moreover, the proposed algorithm performs well in removing a variety of typical noises and can be used universally. In summary, the proposed method is capable of improving the denoising performance caused by the staircase effect for images with severe noise while preserving the weak edge and texture detail more effectively than the TV and AFTV denoising models could.  
      关键词:image denoising;fraction order derivative;total variation;residual image;weak edge;weak texture   
      3299
      |
      479
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112490 false
      更新时间:2024-05-08
    • Image super-resolution using multi-channel convolution

      Li Yunfei, Fu Randi, Jin Wei, Ji Nian
      Vol. 22, Issue 12, Pages: 1690-1700(2017) DOI: 10.11834/jig.170325
      摘要:Super-resolution (SR) technology is the method for satisfying the demand for high-quality images. The method was first proposed in the 1960s, and its goal is to obtain one or a series of high-resolution (HR) image(s) using one or a sequence of low-resolution (LR) image(s). SR technology not only can improve the visual performance of images but also can help improve the analysis and processing of images, including object recognition, image retrieval, and object detection. SR technology is widely used in real life, such as in video surveillance system, medical image processing, and remote sensing image processing. Traditional methods, such as interpolation-, reconstruction-, and learning-based algorithms, cannot achieve desirable SR results and short SR times. In recent years, a modern convolutional neural network (CNN)-based method called super-resolution CNN (SRCNN) has been proposed. The SRCNN method is a deep learning method for single-image SR and directly learns an end-to-end mapping between LR and HR images. This method achieves better performance in SR results and SR times that do the traditional ones but still presents several limitations. SRCNN uses stacked CNN structure and Gauss initialization method, thereby resulting in slow convergence and time-consuming model training. Furthermore, SRCNN exhibits poor nonlinear mapping capability and simple feature extraction because it comprises only three layers of convolution kernels. The method generates unclear HR images of blurry texture. An image SR method based on multi-channel CNN (MCSR) is proposed to resolve the aforementioned issues. MCSR adopts two different strategies, namely, residual CNN model and MSRA initialization method, to accelerate the convergence of model training. Given that residual CNN possesses an identity mapping from input to output, the model training aims to explicitly model the residual image, which is the difference between HR and LR images. This change is advantageous considering that LR and HR images share the same information to a large extent. The MSRA initialization method can maintain activation and back-propagated gradient variances when moving up or down the network. Both schemes result in substantially fast convergence. At the same time, the two schemes are suggested to improve the performance of image SR. The deeper the CNN structure, the better the performance of CNN. MCSR replaces the large convolution kernel, such as 9×9, as chosen by SRCNN with several layers of small convolution kernel, such as 3×3. As a result, MCSR obtains seven layers of convolution kernel and exhibits enhanced capability of nonlinear mapping. In addition to deepening, MCSR is widened to multi-channel on the nonlinear mapping part. Precisely, the basic MCSR possesses four channels of one layer of 3×3 convolution kernels, two layers of stacked 3×3 convolution kernels, one layer of 1×5 convolution kernels, and one layer of 5×1 convolution kernels. Experimental results show that different channels produce dissimilar feature maps. In particular, the 3×3 channel produces local feature maps, the 2×3×3 channel produces relative global feature maps, the 1×5 channel extracts transversal textural features, and the 5×1 channel extracts vertical textural feature. Furthermore, MCSR possesses an extra one layer of 1×1 convolution kernel for compressing the dimension of the feature map, thereby providing the method with powerful nonlinear capability. Powerful nonlinear mapping capability and diverse feature maps can result in good SR performance. The proposed MCSR is trained on Image91 dataset, the same as SRCNN, and tested on Set5, Set14, and BSD200 datasets. Experimental results demonstrate that MCSR converges within 4×10 backprops whereas SRCNN converges at least 1.5×10 backprops. The average peak signal-to-noise ratios (PSNRs) with an upscaling factor 3 on Set5, Set14, and BSD200 are 32.84 dB, 29.28 dB, and 29.03 dB and increase by 0.45 dB, 0.27 dB, and 0.38 dB, respectively, compared with those for SRCNN. Structural similarity image measurement also achieves considerable improvement. With regard to subjective effect, MCSR can produce high-quality HR images of clear texture. The produced images barely show shadow and ripple effects. These findings indicate that MCSR achieves good SR performance. Notably, we propose an extra method called MCSR-Ex, which extends the MCSR method to five channels. The additional channel consists of three layers of 3×3 convolution kernels and improves the PSNR by approximately 0.1 dB on Set5 dataset on average. In this study, a new SR method called MCSR is proposed. On the one hand, the combination of residual model and MSRA initialization method can significantly accelerate the convergence of model training. On the other hand, the suggested two schemes, which include widening the CNN model to multi-channel and deepening the CNN model to seven layers, can considerably improve the performance of image SR. In other words, the good SR performance is attributed to extracting various feature maps and using feature maps.  
      关键词:image super-resolution;deep learning;convolution neural network;multi-channel convolution;residual learning   
      4217
      |
      433
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112546 false
      更新时间:2024-05-08
    • Extraction of monocular depth information based on image intensity clue

      Feng Fan, Ma Jie, Yue Zihan, Shen Liang
      Vol. 22, Issue 12, Pages: 1701-1708(2017) DOI: 10.11834/jig.170001
      摘要:The development of machine vision field has made the use of visual methods to solve the problem of deep extraction a major topic in computer vision research. The scene image acquired by the monocular vision system is the projection of the 3D space in the 2D plane. The depth information is lost during this transformation. In turn, the process of extracting depth from 2D images involves the acquisition of monocular depth information. The acquisition of depth information based on monocular vision is the least costly and the most difficult means of non-contact 3D measurement technology. For a long time, the basic method is to analyze the surface brightness of the object under different light sources by use of the brightness equation to solve the surface normal and the 3D reconstruction of the surface of the object. Contrary to the proposed method, photometric techniques typically require multiple light sources, which are generally limited to a wide range of scenarios. The use of photometric 3D technology for 3D reconstruction of the surface of a single object to restore the effect is accurate. Thus, most existing photometric stereoscopic techniques assume that incident light is parallel to light and that light intensity does not change with distance to simplify the calculation process of the surface normal. To achieve this condition, the actual application must be light intensity, distance light source, or a large area of the array of light sources; such algorithm is used in the light source near the point of light (i.e., the light intensity) to meet the distance of the square inverse decay (i.e., in line with the actual situation) and achieve low cost. The extraction of depth information is the key technology of 3D reconstruction and virtual reality. Traditional monocular methods are computationally large that the application scenario is limited. Monocular information should be used to find a convenient way for quickly extracting the depth of a scene. In this study, we integrate photometric 3D, imaging principles, computer vision, and many other technologies for analysis. The radiance of surface of object illuminated by light source is obtained using the body surface reflection model, and the relation between the radiance of surface of object and the brightness of camera image is deduced using photometric stereo theory. The relationship between depth and change is found on the basis of the change in the brightness of image. The algorithm based on the said relationship is designed to obtain depth information. The algorithm is applied on various experimental scenarios. First, the depth value is estimated for the ladder-like object in the relatively simple scene. The actual distance is accurately measured by the scale, and the value is obtained after 2 decimal places. Then, the algorithm is used to calculate the depth value, which is after 4 decimal places, under the experimental conditions with a maximum total error of 8.6%. Results show that the maximum error of the experiment is less than 9%. The experimental conditions can be improved on the basis of the overall experimental results to achieve the desired requirements of the algorithm. Experimental results show that the proposed algorithm achieves good recovery in simple scenes and other daily scenes and that the accuracy of depth values is over 90%. In this study, depth information is estimated by the image intensity change caused by the movement of light source, thereby avoiding the complicated camera calibration process. The algorithm presents a low computational complexity and is a new method for obtaining depth information. Meanings This study provides a new idea for acquiring monocular depth information based on image brightness cues. The method is based on the analysis of the relationship between surface radiance and image brightness and uses the change in light intensity of the point light source in the process of moving to obtain the scene depth value. The method requires only three pictures for processing, has simple hardware requirements and small calculation complexity, and does not need significant edge of scene and other geometric information. However, only the preliminary principle and performance verification of the proposed depth extraction method are presented herein. In the future work, the proposed method will be improved and optimized by analyzing non-ideal light sources in the case of light reflection of object and using mixed surface reflection models to fit the surface of the non-diffused reflector.  
      关键词:monocular vision;photometric stereo;surface radiance;image brightness change;depth recovery   
      3204
      |
      927
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111816 false
      更新时间:2024-05-08
    • Anti-occlusion light-field depth estimation from adaptive cost volume

      Xiong Wei, Zhang Jun, Gao Xinjian, Zhang Xudong, Gao Jun
      Vol. 22, Issue 12, Pages: 1709-1722(2017) DOI: 10.11834/jig.170324
      摘要:A light-field camera can record space and angular information of a scene within one shot. The space information reflects the position of scene while the angular information reveals the views of scene. Multi-view and refocusing images can be obtained from light-field cameras, which possess unique advantage especially in depth estimation. Occlusion is a challenging issue for light-field depth estimation. Previous works have failed to model occlusion or have considered only single occlusion, thereby failing to achieve accurate depth for multi-occlusion. In this study, we present a light-field depth estimation algorithm that is robust to occlusion in a multi-view stereo matching framework. First, we apply the digital refocusing algorithm to obtain refocusing images. Then, we define occlusions into non-occlusion, single-occlusion, and multi-occlusion types. Given that different occlusions present dissimilar properties, we build the corresponding cost volume with refocusing images based on different occlusion types. Thereafter, we choose the optimal cost volume and calculate the local depth map in accordance with the min-cost principle. Finally, we utilize the graph cut algorithm to optimize local depth results by combining the cost volume and the smoothness constraint in a Markov random field framework to improve the accuracy of depth estimation. We apply the weighted median filter algorithm to remove noise and preserve the edge information of image. Experiments are conducted on the HCI synthetic dataset and Stanford Lytro Illum dataset for real scenes. The proposed approach works better for occluded scenes than do other state-of-the-art methods as its MSE decreases by approximately 26.8%. Our approach obtains highly accurate edge-preserved depth map and is robust to different occlusion types. In addition, the running-time efficiency outperforms that of other methods. Although our approach performs well in the Lambertian scene, it may fail in non-Lambertian scene with glossy objects.  
      关键词:depth estimation;light field;occlusion;refocusing;cost volume   
      3381
      |
      641
      |
      4
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110514 false
      更新时间:2024-05-08
    • Liu Wanjun, Liang Xuejian, Qu Haicheng
      Vol. 22, Issue 12, Pages: 1723-1736(2017) DOI: 10.11834/jig.170079
      摘要:Deep learning has been widely used in computer vision and possesses increased number of network layers, which is its major difference from shallow learning. Deep learning can learn data through multi-level networks, construct a complex nonlinear function model to extract data features, combine low-level features into high-level features, and complete the classification and recognition of data. Deep learning can extract accurate features and avoid the subjectivity and randomness of artificial selection without human participation in the process of feature extraction. Convolutional neural network (CNN) is an important model of deep learning and is widely used in image classification and recognition tasks. Improving the convergence speed and recognition rate can promote the application development of CNN. CNN possesses strong robustness because of its convolution and pooling operation during the feature extraction phase. It also exhibits powerful capability of learning owing to its multiple layers and rich parameters. Many researchers have improved the CNN for its application in different fields. In this study, an adaptively enhanced CNN algorithm is proposed to improve the convergence speed and recognition accuracy of the CNN, reduce the difficulty of training, optimize the convergence effect, and enhance the generalization capability. CNN mainly includes forward and back propagations for classifying and recognizing images. Forward propagation includes feature extraction and target classification, and back propagation includes feedback of classification error and updating of weights. The proposed algorithm is aimed at adding an error adaptively enhanced process between forth and back propagations, building the adaptively enhanced model, constructing the CNN on the basis of the adaptively enhanced model, analyzing the causes of error classification and error feedback pattern during the process of CNN classification and recognition, and training the classification error purposefully. The two largest values in the classification results are extracted as features, and their corresponding errors are enhanced whereas other error values remain unchanged. The classification features and weights of the CNN can be enhanced adaptively with iterations, and the results of classification can accelerate the convergence of the CNN and improve the recognition rate. The optimization degree of adaptively enhanced model for convergence speed, recognition accuracy, and convergence effect as well as generalization capability of the CNN are compared with those of other algorithms. The performance of adaptively enhanced CNN in terms of generalization capability is validated on various datasets. Notably, computing each algorithm is time consuming. The experiments are carried out on datasets of handwritten digital numbers, handwritten characters, and hyperspectral images, and the results of different image recognition and optimization algorithms based on CNN on these datasets are compared. The contrast experimental results show that the adaptively enhanced CNN algorithm can improve the convergence speed and recognition rate in a large extent and can optimize the convergence effect and generalization capability. The error rate of recognition can be reduced by 20.93% on handwritten digital numbers, 11.82% on handwritten characters, and 15.12% on hyper spectral images when it converges. The adaptively enhanced CNN presents no increase in time cost. The proposed algorithm also possesses better recognition effect than that of other CNN optimization algorithms. For example, the error rate of recognition can be reduced by 58.29% and 43.50% at most compared with the rates obtained by dynamic adaptive pooling algorithm and dual optimization algorithm. The proposed algorithm can improve the effect of different gradient optimization algorithms by reducing the error rate of recognition by 33.11% at most. This algorithm also presents different improvements in recognition rate compared with other image recognition algorithms. The adaptively enhanced CNN can enhance the classification feature adaptively. Improvements in convergence speed, recognition rate, and optimization of convergence effect are demonstrated. The CNN can be improved effectively by the adaptively enhanced model without increasing the cost of time. In addition, the proposed algorithm can achieve the optimization effect by use of the different gradient descent algorithms and can be further optimized on the basis of the gradient descent algorithms. The adaptively enhanced CNN exhibits good generalization capability. This algorithm can be further extended to other deep learning algorithms related to CNN.  
      关键词:deep learning;convolutional neural network;image processing;classification and recognition;feature extraction;feature enhanced adaptively   
      3683
      |
      479
      |
      8
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112232 false
      更新时间:2024-05-08
    • Liang Huagang, Yi Sheng, Ru Feng
      Vol. 22, Issue 12, Pages: 1737-1749(2017) DOI: 10.11834/jig.170251
      摘要:Facial expression recognition (FER) is a major research topic in pattern recognition and computer vision and presents a wide application prospect.At present, the method of FER usually focuses on extracting features from 2D face images and analyzing the feature of local facial texture and contours to recognize facial expressions. Given the complexity and subtlety of facial expressions, distinguishing facial expression accurately only using 2D features extracted from 2D face images is difficult and the recognition effect decreases drastically when processing non-database images or face poses and ambient light changes.Traditional 2D facial expression recognition methods are easily influencedby various factors, such as posture and illumination, and cannot effectively recognize a few confusing expressions. In this study, a method that combines 2D pixel features (2D PF) and 3D feature point features (3D FPF)based on Kinect sensor is proposed to achieve robust real-time FER and thus overcome the above-mentioned disadvantages of previous methods. First, 3D data of facial feature points are obtained with Kinect. The face image is segmented by the enclosing rectangle around the area of the eyebrows and mouth. As a result, the segmented face image does not contain the background part and the block of the forehead and chin; other irrelevant areas that do not reflect the expression changes are excluded as well. Then, the classic LBP,Gabor, and HOG operators are used to extract 2D PF from the segmented face images. The computation of LBP, Gabor, and HOG feature extraction is generally relatively complex, thereby hindering the real-time operation of the algorithm. Accordingly,proper adjustments on the process of extracting LBP, HOG, and Gabor features are performed to reduce the computation cost.The eigenvectors are also reduced dimensionally to ensure the real-time performance of the algorithm. However, 2D PF presents difficulty in describing the feature changes in facial expression and is sensitive to various extraneous factors. Thus,three types of 3D features of angle,distance, and normal vector are proposed to describe the deformation of face in detail and thus improve the recognition effects on confusing expressions.The facial expression information mainly includes that of the eyebrows, eyes, mouth, and other local areas. Interferences can be reduced and the efficiency of feature extraction can be improved by excluding a few feature points that are unrelated to facial expression. Thus, the 3D features of angular, distance, and plane normal vectors between the connection line of different feature points in eyebrows and mouth area are selected as feature vectors in describing the changes in facial expression. However, the small number of feature points in eyebrows and mouth area and the low precision of 3D data acquired with Kinect result in poor recognition effects.Accordingly, 2D PF and 3D FPF are integrated to complete the recognition task and thus ensure the balance between the accuracy of expression recognition and real-time performance. Finally, three sets of random forest models are trained by 2D PF and 3D FPF, and the weighting factors of six sets of random forest classifiers are assigned using the feature dimension size to mitigate the influence of the difference between 2D PF and 3D FPF.The final classification results are decided by weighting the average of six sets of random forest to improve the robustness and recognition capability of the algorithm. The effect of the algorithm is verified by recognizing 9 different expressions including calmness, smile, laugh, surprise, fear, anger, sadness, meditation and disgust on the 3D expression database called Face 3D that contains 9 types of facial expressions of 10 individuals, with a total of 9 000 sets of images and feature point data. Experimental results show that the combination of 2D PF and 3D FPF is conducive to discriminating facial expressions. The average recognition rate of 9 expressions on the Face 3D database is 84.7%, which is 4.5% higher than that of the best method proposed in recent years that only combines 2D high-dimensional HOG feature and 3D angle feature. The average recognition rate is 3.0% and 5.8% higher than the rates obtained by the use of 2D or 3D fusion features alone. The recognition rate can reach more than 80% for a few confusing expressions, such as anger, sadness, and fear, and the real-time performance can realize 10 to 15 frames per second owing to the high-speed data acquisition with Kinect. The proposed expression recognition method can improve the expression feature describing capability of the algorithm by combining 2D PF and 3D FPF and can effectively reduce the interference between confusing expressions and enhance the robustness of the algorithm by use of the weighting average of random forest classification. The proposed method is more beneficial to the recognition of facial expression than ordinary 2D or 3D features and can guarantee insignificant decrease in real-time performance.  
      关键词:multi-feature extraction;real-time facial expression recognition;random forest;Kinect depth sensor;multi-expression classification   
      2337
      |
      323
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112037 false
      更新时间:2024-05-08
    • Scene classification algorithm of fusing visual perception

      Shi Jing, Zhu Hong, Wang Dong, Du Sen
      Vol. 22, Issue 12, Pages: 1750-1757(2017) DOI: 10.11834/jig.170232
      摘要:Scene classification is an important part of machine vision. The content of scene is identified by analyzing the objects in the scene and its relative position. In recent years, the amount of image surged has introduced great challenges in image recognition, retrieval, and classification. Accurately obtaining the information needed by users for processing vast data is becoming increasingly urgent in this field. Early image recognition technology has focused mainly on describing the low-level information of images. The bag-of-words model is applied in document processing. This model transforms the document to a combination of keywords first and then conducts matching on the basis of the frequency of keywords. In recent years, this method has been applied to image processing successfully by researchers in computer vision. The image is represented to the document in the bag-of-words model. The visual words of image can be generated by image feature extraction, and the bag-of-words of image can be completed on the basis of the frequency of visual words. At present, an ideal classification effect cannot be achieved easily because of the diversity and complexity of the internal structure of scene classification. Physiological and psychological research has shown that the human visual system pays more attention to significant regions than significant points, and these regions are referred to as saliency regions. Visual attention model is a new major topic in research. Saliency analysis finds the region with most interests and most content of the image by use of a certain calculation method and represents with a saliency figure. In this study, a scene classification algorithm based on visual perception is proposed to address the key and difficult problems in scene classification. Specifically, the visual perception characteristics of the human eyes are considered and significance detection combined with traditional bag-of-visual-words model is employed. On the basis of visual significance and phonetic model, this study fully considers the visual attention area of the human eye and avoids the shortcoming of simple low-level features of failing to capture the interrelationships among targets. On this basis, a multi-scale fusion WSSIFT feature is established using the prominence of the region of interest and the underlying characteristics of screening and weighting to avoid the neglect of important details and remove some of the redundant features. First, the image is decomposed in multi-scale and the image features at each scale are extracted. Second, the visual area of image is detected at each scale. Finally, significant region information and multi-scale feature are integrated to constitute the multi-scale fusion WSSIFT feature and classify scenes. The proposed algorithm is tested on three standard datasets, namely, SE, LS and IS, to verify its effectiveness, and the results are compared with those of different methods. The classification accuracy of the proposed method is improved by approximately 3%~17%. The proposed scene classification algorithm can effectively improve the limitation of simple feature description and the overall expression of image. This method is based on the simple use of image features in scene classification with insufficient feature extraction and neglected interrelation of each object in the scene. This method fully considers human visual perception. On the basis of preserving the advantages of the local subordinate characteristic model, the fusion detection algorithm is used to study the overall sensitivity of image in consideration of the interrelationship between the entire scene and the enhancement in the local information. Accordingly, the multi-scale fusion WSSIFT feature is constructed. Experimental results show that the proposed algorithm exerts good classification effect on multiple datasets. The results of the proposed method on three standard datasets are superior to those of other algorithms. The novel algorithm can be applied to other machine vision fields, such as analysis, understanding, and classification of scenes.  
      关键词:visual perception;scene classification;multi-scale;feature fusion;WSSIFT feature   
      2725
      |
      342
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112621 false
      更新时间:2024-05-08
    • Shi Xue, Li Yu, Li Xiaoli, Zhao Quanhua
      Vol. 22, Issue 12, Pages: 1758-1768(2017) DOI: 10.11834/jig.170255
      摘要:The development of remote sensing technology has improved the resolution of remote sensing images. Thus, high-resolution remote sensing image segmentation has become a major research topic in remote sensing image processing. Geometrical details in high-resolution remote sensing images are more obvious than those in moderate resolution images, but the improvement in spatial resolution increases the spectral similarity between different classes and spectral differences of the same class. This phenomenon leads to the reduction in the statistical separability of different classes, which involves many segmentation errors. Gaussian mixture model (GMM) is a method of modeling statistical distribution of data and is successfully applied to image segmentation. Modeling based on GMM image segmentation method presents some advantages and a simple structure. However, GMM only considers the effect of pixel itself on segmentation and is sensitive to image noise. The robustness of GMM is improved by introducing neighbor relationship into GMM using Markov random field (MRF). MRF-based GMM image segmentation methods using standard expectation maximization (EM) lead to a computational problem in estimating parameters, as EM fails to obtain the global solution of segmentation model. In this study, an image segmentation method is proposed to solve the above-mentioned problem in high-resolution remote sensing image segmentation. The proposed remote sensing image segmentation method combines GMM based on MRF and nonlinear conjugate gradient method (CGM). In the algorithm, GMM is used to model the statistical distribution of pixel intensity in a remote sensing image. The components of GMM are Gaussian distributions, which model the statistical distribution of pixel intensity of each homogeneous area. The MRF introduces the neighbor relationship into the GMM to reduce the noise effect. In other words, the prior distribution of weight coefficient of GMM is modeled by the MRF. The segmentation model, namely, the quality function, is built by combining GMM with prior distribution of GMM weight coefficient on the basis of Bayesian theory. The quality function includes a large number of parameters, such as weight coefficient, mean, and covariance, and possesses a fairly complicated structure. The problem results in difficulty in solving the model parameters. Therefore, the proposed algorithm defines the mean and covariance as the functions of weight coefficient depending on their relationships, thereby minimizing multiple parameters to only one. Although the quality function now involves only one parameter, its structure is still complicated. For solving the parameter, a nonlinear CGM is designed for estimation. This method only uses the value of the quality function and parameter gradient and reduces the complexity of the parameter solution. At the same time, its convergence is fast and the global optimal solution can be obtained. The loss function is defined as the negative quality function to obtain the optimal segmentation, and the formula derivation of gradient of weight coefficient can be calculated easily using the said function. Segmentation experiments are conducted using the proposed algorithm, GMM-CGM algorithm, and spatially variable finite mixture model (SVFMM). In testing the noise resistance of the proposed algorithm, salt-and-pepper noise is added in synthetic and high-resolution remote sensing images. Segmentation results demonstrate that the proposed algorithm effectively improves the noise resistance and obtain better results than those of the compared algorithms. Comparing the estimating parameters shows that the proposed algorithm can derive the global solution whereas the compared algorithms can obtain only the local solution. The overall accuracy and Kappa coefficient are calculated from confusion matrix and are compared with those of the compared algorithms to quantitatively evaluate the proposed algorithm. The accuracy values demonstrate that the proposed algorithm can achieve more precise segmentation results than those of the compared algorithms. This study proposes a high-resolution remote sensing image segmentation method that combines GMM based on MRF with nonlinear CGM. The proposed approach is promising and effective and presents ideal results but still needs improvement. Nonlinear CGM method exhibits many other functions that can be used to estimate the model parameters. Other parameter estimation methods that can conveniently and accurately obtain the optimal solution will be used in future works.  
      关键词:high resolution remote sensing images segmentation;Gaussian mixture model;Markov random field;conjugate gradient method;global optimal solution   
      1893
      |
      328
      |
      7
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111702 false
      更新时间:2024-05-08
    • Group behavior simulation based on group dynamics

      Yang Shanwen, He Wu, Rao Yunbo
      Vol. 22, Issue 12, Pages: 1769-1778(2017) DOI: 10.11834/jig.170183
      摘要:The modeling of virtual crowds has been widely investigated in recent years. Two fundamental approaches are used to model human crowds. The first employs microcosmic methods and mainly includes agent-based, force-based, and rule-based models. In these models, each agent perceives environmental information and responses to static or dynamic obstacles on their own. Microscopic models are suitable for small crowds and provide flexibility. The second is applicable in large-crowd simulations, treats the crowd as a whole, uses macroscopic methods, and mainly includes fluid dynamics, continuum models, and potential fields. However, very few algorithms combine the two aforementioned models to simulate dynamic group behavior. Group dynamics, which has been extensively studied in social psychology, attempts to find the general rule of crowd movement by a dynamic analysis of group phenomenon. The founder of this concept, Kurt Lewin, considers that individual behavior is the result of personality characteristics and environmental influence. Recent researchers have proposed different theories to explain group behavior, but current simulation methods cannot generate believable and heterogeneous crowd simulation because of the separation of global planning and local motion. This study proposes a new method that combines global path planning and local motion control to simulate diverse group behavior. In particular, group dynamics is introduced into continuum crowd simulation to model the following behavior of intra-group and the avoidance behavior of inter-group. First, the environment is divided into a series of 2D grids, the target and obstacle grids are specified, the individuals are converted into unit density fields, and crowd flow constraint is introduced to calculate the maximum speed field. Second, the unit cost field is computed by minimizing a linear combination of the length of path, amount of time to the destination and discomfort degree per unit time along the path. Second, three lists, namely, known, unknown, and candidate lists, are established, and the target grid is stored into the known list. Finally, fast marching method and upwind difference scheme are used to approximate the gradient for constructing a global potential field and providing each individual with an initial velocity. In the second phase, individuals are assigned into groups depending on their walking speeds, moving directions, and locations. Then, the divide-and-conquer algorithm is employed to construct group convex hull, and the group position is the average position of its edge members. Finally, the convex hull edge is expanded to a limited extent and local potential field is constructed by its swept space during a time step. In the local motion control phase, the global potential and local fields are integrated to generate group avoidance behavior and the individual local motion is adjusted to produce following behavior on the basis of following acceleration. After updating the global potential information at each time step, the crowd simulation results in different scale numbers of individuals and grid resolution are compared. Experimental results show that the proposed method can model large-scale crowd simulation in an efficient and diverse manner. For example, when simulating 5 000 individuals walking in a scenario with the grid resolution of 80×80, the average frame rate is 35.7 ms, which is approximately 28 frames per second. Compared with continuum model, the proposed method can produce more group behavior and in the high density area as individuals can dynamically avoid one another because they follow the leader to solve the local interaction. When constructing the global potential field, the fast marching method is influenced by the grid resolution but the coarse grid resolution can be used to compute smooth trajectories. At the same time, the proposed algorithm can generate considerable group behavior on the basis of group dynamics. Existing group behavior models employ additional collision avoidance methods to realize the local movement of a small crowd and thus may sometimes consider several special circumstances. This condition leads to unnecessary computations. Our proposed method integrates local motion control into global path planning and is thus suitable for large-scale diverse crowd simulation. During the simulation process, the method can produce the following behavior of intra-group and the avoidance behavior of inter-group when using the continuum model. Therefore, the diverse crowd motion simulation algorithm is efficient.  
      关键词:group dynamic;crowd simulation;potential field;collision avoidance;group behavior;intra-group following behavior;inter-group avoidance   
      3233
      |
      401
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111492 false
      更新时间:2024-05-08
    • Adaptive marked watershed segmentation algorithm for red blood cell images

      Wang Ya
      Vol. 22, Issue 12, Pages: 1779-1787(2017) DOI: 10.11834/jig.170330
      摘要:Accurate segmentation of the microscopic cell image is the premise and key of computer-aided diagnosis. However, microscopic images of red blood cells always contain images of strongly adherent, overlapping, and pathological cells, thereby bringing difficulties in accurate segmentation. This study proposes an effective algorithm based on watershed transform to segment these cells in the HSI space. The segmentation accuracy of cell images is improved by adaptively marking the foreground and background areas and reconstructing the gradient maps. The marked watershed transform algorithm is widely used in the segmentations of microscopic cell images because of its simplicity and efficiency. The key to the effectiveness of the algorithm is to accurately mark the foreground cells. However, the presence of strongly adherent, overlapping, and pathological cells seriously affects the accuracy of the markers. To overcome this problem, this study adopts low-pass filtering and proposes an adaptive heuristic algorithm. The proposed algorithm marks the reflective center regions of red blood cells due to non-nuclear characteristics and finds the cytopathic regions from the change in the saturation of pathological cells. Thus, the interference of texture changes in cells caused by uneven illumination and cytopathic regions on segmentation is reduced. Then, in the HSI space, the local extreme points of the low-frequency component are extracted from the gradient maps of the S and I components. The two parts of the reflection regions and the low-frequency extreme points are combined as the initial markers to mark the foreground of image. Thereafter, the pseudo markers are removed in accordance with the different features of these marks to ensure that the overlapped regions of adherent cells are not marked. After obtaining the foreground markers, the background markers are derived from the binary image through morphological operations. Subsequently, a modified gradient map is reconstructed using principal component analysis to extract the information from the S and I components of the gradient map and thus suppress the abnormal gradient value and obtain highly detailed gradients. The reconstructed gradient map retains considerable gradient detail and reduces the noise interference. Finally, the marked watershed transform is applied to the reconstructed gradient map, and image segmentation is realized. An experiment for verifying the effectiveness of the proposed method is conducted. Four types of red blood cell images containing images of strongly adherent, overlapping, and pathological cells from the American Society of Hematology database are utilized for segmentation. Experimental results are quantified through three indexes, namely, average less-segmentation rate, average over-segmentation rate, and average accuracy, in objectively evaluating the performance of different algorithms. The first two indexes of the proposed algorithm are 2.23% and 1.67%, which are significantly lower than the results of the two other watershed algorithms in literature. The segmentation accuracy is as high as 96.10%. The segmentation performance of the proposed algorithm is better than that of the two other algorithms. With regard to timeliness, the average running time of the proposed algorithm is 6.06 s compared with that of the two other algorithms. This study proposes a segmentation algorithm to colored images of red blood cells containing images of strongly adherent, overlapping, and pathological cells. The strongly adherent and overlapping cells can be marked adaptively and accurately by taking advantage of the saturation and brightness information from the S and I components in the HSI space. The principal component analysis can effectively preserve the original boundaries of the overlapping cells. This adaptive marked watershed algorithm not only can separate strongly adherent and overlapping cells but also can effectively suppress the influence of the shape and internal nature of pathological cells on the segmentation. The segmentation results of the proposed algorithm are outstanding in terms of either the accuracy of cell count or the preservation of cell morphology. The performance of the algorithm is insensitive to cytopathic regions and possesses good robustness. This algorithm can be widely used in the segmentation of microscopic staining images with round overlapping and adherent cells as well as blood red cells.  
      关键词:marked watershed transformation;hue-saturation-intensity (HSI) model;low-pass filtering;princical component analysis (PCA);cell image segmentation   
      3613
      |
      396
      |
      12
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111134 false
      更新时间:2024-05-08
    • Han Jie, Guo Qing, Li An
      Vol. 22, Issue 12, Pages: 1788-1797(2017) DOI: 10.11834/jig.170222
      摘要:Road extraction in complex scenes for high-resolution remote sensing images is mostly performed using the supervised classification method. However, this method requires the manual selection of samples, thereby leading to low degree of automation and instability. The pixel-based extraction method exhibits low completeness and can easily produce salt-and-pepper noise. The object-oriented method can easily produce adhesion problem. In this study, a road extraction method based on unsupervised classification and geometrical-textural-spectral features (i.e., unsupervised classification and fused features, UCFF) is proposed to improve the completeness, accuracy, and automation of road extraction. In the UCFF method, the unsupervised classification avoids the artificial sample selection steps; its fusion of spectral, geometrical, and textural features produces extraction results of high accuracy. The proposed UCFF road extraction method still presents high completeness and accuracy while improving the degree of automation. The UCFF method mainly includes three parts:the road candidate area extraction, non-road area filtering, and road centerline extraction. The road candidate area extraction involves two steps. First, spectral features are adopted for unsupervised classification. The ISODATA method is used in the unsupervised classification. However, the non-road area of similar spectral features is easily separated into roads by the unsupervised classification. Second, another classification based on textural feature is applied. Given that most of the non-road and road areas significantly differ in textural features, the classification based on textural feature can be used to distinguish road area from the non-road area with similar spectral features. The textural feature used is the variance of four different directions:horizontal, vertical, diagonal, and back-diagonal. A 3×3 sliding window with a center pixel is used to calculate the variance. When all the variances in four directions are less than the threshold, the center pixel is considered the road. By combining the two classification results, the road candidate region is obtained. For the non-road area filtering, edge, texture, and shape filters are gradually applied. The edge filter is used to disconnect road and non-road connections. A Canny method is performed to extract the edge. The roads are considered not on the edge. If a point in the image is an edge, then it is not the road. The textural feature is used to filter large non-road area that is wider than the road, such as large parking lots and buildings. In the binary image, the road is narrow and is a long connected area. A window that is slightly larger than the width of the road is selected. In this window, the textural feature value of the non-road area is less than that of the road area. When the textural feature value is less than the threshold, the center pixel is considered the non-road and should be filtered. Finally, a shape index (i.e., linear feature index) is used to filter the remaining small non-road area and thus extract the final road area. For the road centerline extraction, a tensor voting algorithm is used. Experiments on high-resolution remote sensing images are conducted to verify the effectiveness of the proposed method. High-resolution IKONOS images of Australia's Hobart region and QuickBird images of Texas, USA, in complex scenes are employed. Another two typical pixel-based and object-oriented road extraction methods at home and abroad are compared to further prove the feasibility and effectiveness of the proposed method. Three indexes of completeness, accuracy, and detection quality are used for quantitative evaluation. Quantitative analysis shows that the three indexes of the proposed method are averagely 26.61%, 5.57%, and 26.77% higher than those of the two other algorithms. The qualitative analysis also shows that the proposed method can effectively suppress the salt-and-pepper noise and adhesion phenomena. The method also presents high automation capability. The proposed road extraction method can be used mainly in complex scenes for high-resolution remote sensing images. The preliminary classification under unsupervised conditions can avoid the steps of sample selection in an automatic manner. The spectral feature is considered in the unsupervised classification, while the textural feature of road is fully utilized in the texture classification. Thereafter, the two road classification results are combined as the candidate region to effectively avoid the adhesion and salt-and-pepper noise phenomena. The road extraction results are smooth and complete. In the non-road filter stage, the specific nature of the road is fully used, which combines the edge, shape, and textural features to achieve the accuracy of road extraction. A road extraction method for high-resolution remote sensing images based on unsupervised classification and geometrical-textural-spectral features is proposed. The unsupervised classification method possesses higher degree of automation than that of the supervised classification method. Road extraction under complex scenes based on geometrical-textural-spectral features can effectively avoid the salt-and-pepper noise and adhesion phenomena while obtaining high completeness and accuracy. This method is suitable for road extraction in complex scenes, such as cities, and achieves high completeness, accuracy, and automation. Unsupervised classification and multiple feature combination have broad application prospects in the road extraction. The theory can provide reference for other road extraction in future studies.  
      关键词:unsupervised classification;texture feature classification;edge filter;shape filter;road extraction   
      2480
      |
      405
      |
      8
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112412 false
      更新时间:2024-05-08
    • Lin Yuzhun, Zhang Baoming, Wang Dandi, Chen Xiaowei, Xu Junfeng
      Vol. 22, Issue 12, Pages: 1798-1808(2017) DOI: 10.11834/jig.170346
      摘要:Buildings are important infrastructures in geodatabase. Extracting buildings effectively and accurately is significant in urban planning, population distribution, land analysis, and environmental surveys. However, buildings cannot be extracted using a unified model because of the common phenomenon of "same object with different spectra and same spectrum with different objects" and given that the spectrum, shape, and texture of a building are not exactly the same. Considering that most buildings are in rectangular shapes and that the spectral information of a building is generally more uniform than that of other objects, some buildings can be extracted with shape and gradient information. On this basis, the open space and waters are removed using the shadow as a context feature. The texture of the imagery within buildings usually presents a good consistency within a certain area. Thus, other buildings can be extracted by the probability model, which is established by textural features from the initial extraction of buildings as the training samples. On the basis of the analysis above, a hierarchical building extraction method based on spectral, spatial, contextual, and textural features is proposed. The proposed method includes three main parts, namely, image preprocessing, initial extraction of buildings, and final extraction of buildings, and is applied to single high-resolution remote sensing imagery. High-frequency noise is removed and edge information is effectively retained by imagery preprocessing based on bilateral filter. Vegetation and shadow are extracted using super green and morphological shadow indexes. The building index is constructed using multi-scale and multi-orientation gradient operators, and some rectangle buildings are extracted using the building index and shape features. The extracted buildings are expanded with the multi-direction linear structural elements. The voting matrix is calculated by the number of pixels included in the intersection of the expansion result and the shadow to determine the light direction. The extracted buildings are eliminated on the basis of the shadow feature and the light direction. Thus, the initial extraction of the building is completed. Finally, pixel-level building extraction result is obtained by the probability model, which is established by utilizing the textural feature vector of the initial extraction result. The final extraction result is achieved by judging the number of pixels and the proportion of the objects generated by imagery segmentation. Two high-resolution remote sensing imageries from Okinawa and Chicago are used in verifying the validity of the proposed method. Results before and after using the shadow feature are compared, and the proposed initial extraction method is compared with neighborhood total variation method and Sobel operator. Experimental results show that the proposed method can recognize buildings in different directions and provide highly reliable building samples for post-processing. Shadow as a context feature can effectively avoid the confusion of open space, road, and other objects. In the building extraction experiment, the proposed method is compared with morphological building index + morphological shadow index algorithm and neighborhood total variation + mixed Gaussian model + Bayesian decision algorithm. The experiments are carried out using precision rate, recall rate, and F-score, and the final accuracies of our experiments are significantly improved. Precision rate increases by 2.90 percentage points, recall rate increases by 12.49 percentage points, and F-score increases by 8.84. This study presents a method that combines spectral, shape, contextual, and textural features for building extraction. The proposed method adopts the hierarchical extraction strategy. First, building samples are extracted on the basis of the characteristics of spectral information and their shape (i.e., rectangular). Then, the probability model is established using the textural feature of samples to complete the building extraction. The proposed method is based on the characteristics of high-resolution remote sensing imagery. The method exhibits significant advantages in the area where buildings are dense and imagery is complex and can extract the building targets with various shapes and spectral characteristics. The proposed method can also extract the building objects with strong anti-jamming capability, high accuracy, and great applicability. However, effectively determining the optimal scale in post-extraction and treating the obstructed buildings remain to be further studied. Given that textural feature is mainly dependent on post-extraction, the recognition of buildings with different textural features from samples is poor and should therefore be enhanced in future works.  
      关键词:high resolution remote sensing imagery;building extraction;multi-feature fusion;building index;Gaussian model   
      2772
      |
      413
      |
      9
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56112111 false
      更新时间:2024-05-08
    0