最新刊期

    22 5 2017
    • Image engineering in China: 2016

      Zhang Yujin
      Vol. 22, Issue 5, Pages: 563-574(2017) DOI: 10.11834/jig.170115
      摘要:This is the twenty-second one in the annual survey series of the yearly important bibliographies on image engineering in China. The purpose of this statistic and analysis work is mainly to capture the up-to-date development of image engineering in China, to provide a targeted means of literature searching facility for readers working in related areas, and to supply a useful recommendation for the editors of journals and potential authors of papers. Considering the wide distribution of related publications in China, 728 references on image engineering research and technique are selected carefully from 2 938 research papers published in 148 issues of a set of 15 Chinese journals. These 15 journals are considered as important journals in which papers concerning image engineering have higher quality and are relatively concentrated. Those selected references are classified first into 5 categories (image processing, image analysis, image understanding, technique application and survey), and then into 23 specialized classes according to their main contents (same as the last 11 years). Some analysis and discussions about the statistics made on the results of classifications by journal and by category are also presented, respectively. According to the analysis on the statistics in 2016, it seems that image analysis is getting the most attention, the number of publications on target tracking remains at a high level, image segmentation and edge detection are still a focus of research, the image matching and fusion in remote sensing, mapping and other applications continuously are a hot spot, the applications of bio-medical images have a significant increase this year. This work shows a general and off-the-shelf picture of the various progresses, either for depth or for width, of image engineering in China in 2016.  
      关键词:image engineering;image processing;image analysis;image understanding;technique application;literature survey;literature statistics;literature classification;bibliometrics   
      2599
      |
      336
      |
      8
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111017 false
      更新时间:2024-05-07
    • Li Tingting, Jiang Zhaohui, Rao Yuan, Zhang Xiaoming
      Vol. 22, Issue 5, Pages: 575-583(2017) DOI: 10.11834/jig.160504
      摘要:sensitivity to the initial value and poor anti-noise performance are two important factors affecting fuzzy c-mean (FCM) clustering in image segmentation. In this study, image segmentation based on gene expression programming (GEP) and spatial fuzzy clustering is proposed to solve the two problems. GEP is a novel adaptive evolutionary algorithm that can solve the problems by simulating biological gene structure and genetic evolution. The performance of GEP is excellent, and it currently has a few applications in remote-sensing image segmentation. The standard FCM only deals with the gray-level information of pixels. However, the pixels on an image are highly correlated, and the pixels within the neighborhood have almost the same data characteristics. Therefore, the spatial relationships among adjacent pixels must be considered an important feature in image segmentation. The GEP owns a unique structure, a more flexible coding method, and a richer genetic operator, which make it a better search method compared with other methods. In the proposed method, the GEP algorithm is introduced for the initial segmentation at the first stage. The clustering centers are encoded into chromosomes as the action objects in GEP. With genetic operations, namely, selection, recombination, and mutation, chromosomes that represent individuals will evolve to the next generation. Fitness function is used to evaluate each individual, which is set as the reciprocal of the objective function in the FCM in this study. After a certain number of evolutions, the individual with the highest fitness value will then be kept as the initial solution. At the second stage, spatial function is introduced to reduce the adverse effects of noise points on the image segmentation. With spatial function values of pixels included, the membership function is redefined. The overall process of spatial fuzzy clustering is the same as that of FCM; however, the initial value is from the result of the first stage. The segmentation experiments on noisy synthetic image and noisy Berkeley images show that the performances of the proposed method in index partition coefficient (VPC), clustering entropy (VPE), and peak signal-to-noise ratio (PSNR) are much better than those of the other two classical clustering segmentation algorithms. The average VPC values are 0.062 4 and 0.061 1 higher than those of the classical algorithms. The average VPE decreases by 0.117 0 and 0.101 1, and the average PSNR increases by approximately 13.312 1 and 3.308 4. Although the run time required for this method increases, the number of iterations required for the solution is greatly reduced. The segmentation experiments of six images in the Berkeley image library show that the VPC values of the proposed method are all above 0.93, which is 0.157 6 and 0.013 3 higher than the values of the two comparison methods. The mean value of PSNR increases by 2.896 3 and 1.934 4, and the VPE values are in the vicinity of 0.1 and are lower than the comparison methods. On the multi-target segmentation experiment, the segmentation performance of the three methods decreases with an increase in the number of clusters. However, the performance curve of the proposed method is the smoothest, and the results are the least affected by the number of clusters. The proposed method has strong capabilities of anti-noise, high segmentation accuracy, and stability. This method is suitable for the segmentation scenario, in which highly accurate results are required and time requirements are not high.  
      关键词:image segmentation;fuzzy C-means;gene expression programming;clustering;spatial function   
      2911
      |
      375
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110976 false
      更新时间:2024-05-07
    • Zhou Shuaijun, Ren Fuji, Du Jun, Yang Sai
      Vol. 22, Issue 5, Pages: 584-595(2017) DOI: 10.11834/jig.160387
      摘要:Salient region detection based on prior has captured the attention of researchers in recent years. According to experience, the researchers assume that some regions are foreground or background and then detect salient objects based on these assumptions. However, the method based on single prior has a few weaknesses. For example, several classic methods based on center-bias prior always assume that objects are near the image center and disregard the objects that are far from the center. Methods based on background-bias prior take boundaries as background and calculate the saliency values of other regions according to their contrast with the background. Objects on boundaries cannot be highlighted. To solve these problems, a novel salient object detection algorithm based on the integration of center-bias prior and background-bias prior is proposed. First, the input image is divided into superpixels. Second, a graph is constructed with superpixels as nodes. Boundary nodes are selected as absorbing nodes, and the others are taken as transient nodes. The absorbed time from each transient node to boundary absorbing nodes is computed based on the absorbing Markov chain and taken as background-bias prior values. A single-scale background-bias prior map is obtained. Third, corner points of the input image are detected with a novel Harris corner detection algorithm. A two-dimensional Gaussian function with peak at the cluster center of corner points is constructed. The center-bias prior value of each superpixel is obtained through the Gaussian function. Fourth, the two prior maps are combined into the final saliency map. Results from multi-scale saliency maps are integrated to further improve the detection performance by different-scale superpixel segmentations. Extensive experiments of state-of-the-art methods and the method based on the integration of center-bias prior and background-bias prior are conducted on ASD, SED1, SED2, and SOD benchmark datasets. Experimental results show that the proposed method performs favorably against 10 state-of-the-art methods in terms of precision, recall, and F-measure. Compared with the method based on absorbing Markov chain, the evaluation metrics of the proposed method increased by at least 3%. Every method is also evaluated through the subjective approach. The result of the proposed method is the most excellent in detail and global aspects. In contrast to the method based on a single prior, the proposed method based on the integration of center-bias prior and background-bias prior can extract more minute features of the input image. The saliency maps not only show global contrast but also have highly detailed information. The proposed method can highlight the salient object more effectively. Single object in images can be detected accurately because of the limitation of the center-bias prior. With regard to images with more than one object, the proposed method has some imperfections.  
      关键词:salient object detection;center-bias prior;background-bias prior;multi-scale detection   
      4234
      |
      390
      |
      9
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110624 false
      更新时间:2024-05-07
    • Vector road matching algorithm based on a topological path

      Li Xiang, Ma Shuang, Yang Hui, Zhang Xiaonan
      Vol. 22, Issue 5, Pages: 596-609(2017) DOI: 10.11834/jig.160386
      摘要:The aided navigation method based on track matching has evident advantages in both anti-external interference and independence. Its key concept is to compare high-precision road data in navigation electronic maps or network databases with vehicle trajectory data obtained from sensor measurement. The location of a vehicle can be determined based on road information; hence, the error of an integrated navigation system can be prevented and localization accuracy can be improved to a certain extent. In the case of special road conditions such as intersections, cross bridges, or ramps, several false road matches can occur, which directly affect the positioning of navigation results. A vector road, which is generally composed of discrete point sets, exhibits a high degree of accuracy in positioning coordinates and topological relations. Thus, the relationship between road topology and path can be fully utilized in a matching algorithm. A vector road matching algorithm based on road tracing is proposed to address the problem in matching algorithms based on road shape feature. A road is preprocessed into a ring topology, which includes nodes, roads, and intersections, in accordance with road requirements. The four states are divided as follows:initialization, tracing, intersection, and searching. The algorithm begins to process the road tracing based on actual state characteristics. During road tracing, different processing steps can be performed for the matching. The error correction of vehicle trajectory can be achieved by analyzing the road tracing results. Moreover, the algorithm also statistically analyzes the road tracing conditions for intersection tracking under different thresholds to ensure the accuracy of the matching results. The vector road matching algorithm based on topological path tracing can considerably improve the results of navigation error correction in real time and eliminate mismatches due to complex junctions and other sections. Repositioning can be rapidly performed via the search set from the state of positioning loss. The distance thresholds on the intersection are set for accurate intersection state matching, which can be selected according to the field condition of the corresponding distance value. The vector road data of Beijing city west 5th ring and the Miyun area were obtained via the Global Positioning System (GPS)-Real Time Kinematic, and the inertial navigation trajectory data of field cars were adopted in the simulation experiments. The selection of the intersection distance threshold and the comparison tests for the matching effect and real-time performance between the proposed algorithm and the traditional matching algorithm based on road shape feature are completed during the experiment. Matching accuracy and matching time are used as the standards for every test. The results show that when distance threshold is 20 m, the intersection matching exhibits the highest accuracy, which can achieve a maximum value of 93.5%. This study verifies the effect of the algorithm in a real road test. The experimental results show that the proposed road tracking algorithm demonstrates better matching accuracy for driving on intersections than the traditional algorithm based on shape. Its matching accuracy rate is 90.2%, and its matching speed increases by 4 to 8 times in the same road segment. The vector road matching algorithm based on path tracing has evident advantages in terms of real-time tracking and the matching effect. The algorithm is simple, feasible, and its matching accuracy is higher, thereby effectively fulfilling the requirements for integrated navigation matching. The proposed algorithm has several advantages over the traditional matching algorithm, including higher accuracy rate, excellent stability, and faster match speed. It can be used to aid in integrated navigation under special conditions, such as a GPS signal blind zone and signal interference environment, to compensate for the lack of autonomy and anti-interference in traditional navigation based on satellites. The experiments also demonstrate the effectiveness and practicality of the proposed method.However, the algorithm still requires further optimization and analysis in terms of adjusting parameters as well as more matching methods. We aim to devise more effective aided information and matching methods for integrated navigation in the future.  
      关键词:path tracing;topological structure;road matching;intersection;track feature   
      3395
      |
      331
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111314 false
      更新时间:2024-05-07
    • Hu Yuanyuan, Fan Jianchao, Wang Jun
      Vol. 22, Issue 5, Pages: 610-621(2017) DOI: 10.11834/jig.160576
      摘要:Marine floating raft aquaculture is widely distributed in nearshore zones. The effective information extraction of floating raft aquaculture is conducive to the dynamic monitoring of sea area uses, which can rationally use marine aquaculture resources and create a healthy balance of sea ecological environment. Satellite synthetic aperture radar (SAR) imagery can overcome the influences of the marine meteorological environment and effectively reflect the locations of floating rafts. However, the information of floating raft aquaculture in SAR images is seriously affected by multiplicative speckle noise. Considerable isolated noise points exist on the surface of floating raft aquaculture, and the edges are so fuzzy that clearly distinguishing floating raft aquaculture from the sea background is difficult. The traditional unsupervised algorithm is ineffective for not taking the characteristics of SAR data into account. To solve the problem, this study improves the local binary pattern algorithm to generate the generalized local binary pattern (GLBP), which reduces noise sensitivity and obtains texture features according to SAR data characteristics. The GLBP is then added to the merging criterion of the generalized statistical region merging (GSRM). A multi-feature integration model is constructed based on GLBP_GSRM to acquire superpixels with highly consistent texture features and achieve accurate extraction of the information on floating raft aquaculture. Given that SAR data are characteristics of multiplicative noise, the local binary pattern operator is improved to obtain the GLBP operator. The GLBP is then added to the merging criterion of the GSRM to raise the merging requirement and acquire superpixels. Superpixel segmentation with texture information can be used to obtain texture-consistent superpixels and effectively overcome the high contamination of speckle noise. Nonsubsampled contourlet transform (NSCT) is utilized to obtain the contour feature and enrich the data feature. The fuzzy compactness and separation algorithm is used to cluster and achieve the unsupervised extraction of floating raft aquaculture. In the experiments, the sea area near Changhai County in Liaoning Province is selected as the research area. The images of Radarsat-2 SAR with C-band and of TerraSAR with X-band are used. Different regions in the same image and the same region in different images are selected. Experimental results are compared with real results of the field survey and show that the proposed model can precisely extract information of floating raft aquaculture in SAR images with different types. Moreover, the proposed model is distinctly superior to the classical unsupervised algorithms. The classification accuracies in three experiments are 88.31%, 85.02%, and 85.52%, which prove the effectiveness of the proposed model. A comparison of the experimental results of different regions in the same image indicates that, although the backscattering properties of different regions are different and the experimental data of two regions are affected by the speckle noise differently, the proposed method can still accurately and effectively extract the information of floating raft aquaculture. A comparison of the experimental results of the same region in different images with different bands and resolutions presents that, although the capturing time interval of images is several months and the floating raft aquaculture also has some changes, the backscattering characteristics of the same region are similar and seriously affected by the speckle noise. Experimental results show that the proposed model can accurately extract the floating raft aquaculture information of the same region in different SAR images with different resolutions. Some experiments also analyzed the contribution of each part of the model, and the results show that the superpixel segmentation algorithm GLBP_GSRM can obtain superpixels with high texture consistency and reduce the isolated points in the entire image. The contourlet feature obtained from the NSCT enriches the feature of the raw data and can make areas of floating raft aquaculture more complete with smooth edges. In brief, the proposed model can effectively overcome the high contamination of the speckle noise and precisely extract the information of floating raft aquaculture. The proposed model utilizes the texture feature, spatial feature, and contour feature to solve the problem of speckle noise jamming information extraction. In SAR images with different types, the proposed model can effectively extract the information of floating raft aquaculture from complex oceanic backgrounds and improve the automatic monitoring accuracy of marine aquaculture. The idea of the proposed model can be widely applied to the information extraction of other types in sea areas and make it more convenient to dynamically monitor the sea area. However, in this study, SAR images with different bands and resolutions are used in the experiments and are unable to explore the resolution influence. Therefore, the next research is to study SAR data with different resolutions in the same band and explore the resolution effect on the information extraction of floating raft aquaculture.  
      关键词:floating raft aquaculture;synthetic aperture radar(SAR);unsupervised;local binary pattern;statistical region merging   
      3809
      |
      422
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111266 false
      更新时间:2024-05-07
    • Simulating chalk art style painting

      Qian Wenhua, Xu Dan, Guan Zheng, Pu Yuanyuan, Yu Yangtao, Yang Meng
      Vol. 22, Issue 5, Pages: 622-630(2017) DOI: 10.11834/jig.160540
      摘要:Non-photorealistic rendering (NPR) technique combines artistic painting rule and scientific methods. This technique can express and transfer information besides photorealistic computer graphics. The NPR also serves an increasingly important role in medical science, architectonic, and education. Many artworks, such as oil, watercolor, and Chinese calligraphy, have recently been simulated. Although numerous NPR methods have been proposed to digitize and simulate artistic works, the exploration or rendering of different artistic works remains extremely challenging and an open question. A few NPR methods can be adopted to render chalk art style. This study proposes an NPR technique that generates a chalk art style from a 2D photograph on the basis of diffusion and line integral convolution (LIC). Similar to existing exemplar-based methods, an input natural image is regarded as a foreground image, and an input material image is taken as a background image. We obtain the final chalk art painting through the foreground image mapping to the background image. First, continuous and smooth edge information is obtained by threshold processing and edge extraction from the target image. The edge detection method is based on the difference of the Gaussian filter. Considering that the real chalk art painting has coarse lines, we adopt the diffusion technique to simulate this characteristic. Image enhancement is also used to enhance the details of the edge information. Second, when people create a chalk painting, they usually use chalk to draw on the blackboard or other materials; the chalk pigment will then be absorbed on these materials to show some artistic illustration. Therefore, this textural characteristic of the chalk stroke should be simulated in the system. Our algorithm adds some white noise to the target image, and the chalk brush texture is simulated based on the LIC. The morphological dilation operation is used to generate the final chalk stroke texture. The real artistic effect of the chalk painting is often created in blackboard, wood, and other materials. Thus, the blackboard image is input as the background image. Based on the layer-mapping technique, the algorithm will merge the brush texture image, color image, and edge image to the blackboard image. Subsequently, the final chalk painting art style will be simulated. We can obtain chalk art illustrations by inputting different source images. The line details and stroke texture of the chalk characteristic can also be displayed. When the different foreground images and background images are input, different chalk illustrations can be obtained. This study proposes an NPR method to generate chalk art style. Experimental results demonstrate the effectiveness of our method in producing chalk line stylistic illustrations. This chalk painting art style is a useful supplement to the NPR, and the rendering results advance the NPR field. People without painting experience can create chalk art painting through our simple system. The proposed method is simple, fast, and easy to implement.  
      关键词:non-photorealistic rendering;chalk art;edge detection;image mergence   
      4798
      |
      391
      |
      5
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111600 false
      更新时间:2024-05-07
    • Interactive genetic algorithm for extending 3D scenes

      Zhang Yan, Fei Guangzheng
      Vol. 22, Issue 5, Pages: 631-642(2017) DOI: 10.11834/jig.160547
      摘要:The design of 3D scenes should obey the rules of the architecture's organization. At present, 3D scene designs are typically carried out by art an designer who lacks knowledge of architecture. A method is proposed in this paper to solve this problem. We extracted the flat and facade features of a scene and analyzed the features by interactive genetic algorithm (IGA). A new method is introduced to evaluate the weights and scores of these characteristics, as well as obtain the adaptive values to optimize the organization of the old structure. After evolution and learning our algorithm, we can expand and reconstruct the old scene to a new scene, which has better organizational form.The 3D reconstruction scene realize the personality style and provide a better experience to users. This method has significance in the field of 3D game design, historical remains recovery, and landscape design. Adaptive Resonance Theory neural network is introduced to extract the features of 3D scenes. The ant colony algorithm is then utilized to optimize the layout of the scene, and introduce the interactive genetic algorithm to obtain the best adaptive individuals to form a larger scene. The algorithm is combined with the intuition and psychological characteristics of the designer, which is realized by the algorithm. The principle of the method is based on the approximation model, which maps the 3D scene to human psychological space. The optimal solution is obtained by the adaptive values of evaluation. To avoid the problem of individual fatigue, we evaluate the information of samples to train the fitness function and obtain an approximate model for updating information in the process of evolution. The method uses a neural network to clutter the feature of 3D models and effectively decrease the work of manual evaluation. This study used a series of specific scenes and extracted features of the scene based on the user evaluation to expand the original scene. The neural network method is used to realize the reorganization and extraction of features. In addition, ant colony algorithm is utilized to reorganize and expand the 3D scene. After using interactive genetic algorithm, we realize the expansion and reconstruction of the old scene. This research analyzed the optimization design of 3D scenes and proposed the idea on how to reconstruct and expand the complex 3D scenes. A hierarchical decomposition method is presented to optimize the complex scene and search each layer to maximize the value of the symmetry characteristics. Based on these symmetry features, we can realize the reconstruction, and by using the ant colony algorithm, we would obtain the optimized layout scheme. The IGA is introduced to obtain the optimal solutions to the scene. Through the optimization of IGA, we can obtain more accurate adaptive values. Optimal individuals can be generated and more optimized design scheme will be obtained. This method can quickly generate a large scene with the original feature and symmetry, as well as realize the expansion and reconstruction of the old scene. Moreover, it mixed features of the local 3-D structure without manual layout and design. The deficiency is that the user satisfaction of the reconstructed scene and the manual organization scene still require substantial experiments for verification. A disadvantage in this method is that it does not perform well enough to explore all kinds of implicit aesthetic indexes. For the analysis of the aesthetic characteristics and style of the 3-D scene, it is impossible to establish a realistic approximate model for quantitative analysis in the present stage. In-depth studies and further efforts should be devoted to solve the problems above. Experiments show that this method is practical and effective; it can effectively improve the efficiency of the design and enhance the ornamental value. This method has practical significance in 3D game design, virtual restoration, and visual simulation.  
      关键词:virtual reality;genetic algorithm;ant colony algorithm;adaptive resonance theory;feature extraction   
      3682
      |
      301
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110897 false
      更新时间:2024-05-07
    • Fast point cloud registration based on RGB-D data

      Su Benyue, Ma Jinyu, Peng Yusheng, Sheng Min, Ma Zuchang
      Vol. 22, Issue 5, Pages: 643-655(2017) DOI: 10.11834/jig.160602
      摘要:Three-dimensional reconstruction of real objects has always been a hot topic in computer graphics, computer vision, and other fields. In response to the 3D reconstruction of an object obtained with non-uniform rotating from non-fixed angle, a registration method to reconstruct an object 3D model based on RGB-D data is presented by using a rotating platform. Initially, the depth data and color data of the target on the rotating platform are collected by Kinect, and the bounding box algorithm is used to remove the background noise and external points to obtain the required point cloud data with color information. Subsequently, we use point cloud data calibration in different angles to calibrate the center axis of the rotating platform, thereby obtaining the relative relationship between Kinect and the rotating platform. Next, the curvature feature of the target point cloud is used to extract the feature points and find the corresponding points of adjacent point cloud, and then the feature points in non-overlapping regions. For the selection of feature points, we first use k-d tree to search for k neighboring points at any point in the point cloud, apply the surface fitting to these points, and then calculate the Gaussian curvature. The n points of the Gaussian curvature are larger as the characteristic points of the point cloud. The value of n is determined by the number of points, the density, and the complexity of the point cloud. Therefore, n points can reflect the approximate contours or surface features of the object. The Euclidean distance must be considered when selecting corresponding points to better reflect the rotation corresponding relationship of a pair of points in the point cloud in the process of rotation. In the actual registration, a large number of wrong correspondence points are frequently found using Euclidean distance because the point cloud overlap or points are too far away. The use of the arc minimum distance to find the corresponding point can effectively reduce the false point pairs because the target is rotated only around the axis of rotation during the scanning process. Third, the dichotomy iterations method is introduced to find the optimal rotation angle around a central axis, which meets the minimum of the registration error between the two points cloud. Finally, take the point cloud data acquired from any angle to obtain registration under the unified coordinate system and rebuild the model. The experimental results with Stanford University's point cloud database and self-collected database were compared with the existing methods. Compared with the traditional ICP algorithm and the improved ICP algorithm on the Stanford's point cloud database with an average of 75 000 sampling points, the iterative times are reduced by 86.5% and 57.5% respectively, and the running time of the two algorithms are reduced by 87% and 60.75% respectively. The mean square of the Euclidean distance error of the two algorithms is reduced by 70% and 22% respectively. Compared with the traditional ICP and the improved ICP algorithms on the self-collected database with an average of 57 000 sampling points, the average number of iterations is reduced by 94% and 75%, the running time of the algorithm is reduced by 92% and 69%, and the average mean squared of the Euclidean distance error is reduced by 61.5% and 30.6%. Experimental results show that the proposed method is more efficient and has less registration error. Compared with KinectFusion algorithm, this method also has good effects on texture detail preservation. In this paper, the registration algorithm of point cloud based on rotating platform calibration is proposed, and the bisection iterative algorithm can effectively reduce the registration complexity. The comparison of the proposed algorithm and the traditional ICP and improved ICP algorithms also shows the effectiveness of the proposed algorithm. In addition, the superiority of this method is verified by comparing it with other methods in point cloud registration experiments with texture. In this method, only a single Kinect can be used to understand the 3-D modeling of non-uniform rotating objects with non-uniform rotation angles. This method is convenient, practical, and suitable for simple and rapid 3-D reconstruction applications.  
      关键词:RGBD data;3D scanning;point cloud registration;Kinect;depth data   
      6421
      |
      454
      |
      4
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110692 false
      更新时间:2024-05-07
    • Error protection for key frames in distributed video coding

      Rong Song, Yang Hong, Qing Linbo, Wang Zhengyong
      Vol. 22, Issue 5, Pages: 656-662(2017) DOI: 10.11834/jig.160599
      摘要:Distributed video coding (DVC) has attracted the significant attention of many relevant international standardization committees and experts ever since the emergence of distributed source coding (DSC). DSC is a new class of source coding approaches based on the Slepian-Wolf theorem and the Wyner-Ziv (WZ) theorem. Owing to its characteristic of slight encoding and high error robustness, DVC is a good way to meet the demands of the new video business, which requires low-power consumption and low complexity, such as video chat, unmanned aerial wireless monitoring, and so on. However, the bit error ratio of the wireless channel is higher than the wired channel because of the impact of the channel attenuation, multipath interference, frequency band mutual interference, and so on. In the DVC system, video source is interleaved with key frames and WZ frames, and the side information regarded as the noise version of the current WZ frame is generated by the motion estimation and compensation algorithm of the adjacent key frames. Therefore, the key frames, regardless of their ability to correctly decode and transmit, would affect the compression efficiency and rate distortion of the whole system. However, the robustness of the key frames that use traditional intra-frame coding is far lower than that of the WZ frames, which are based on channel coding. For the robustness and transmission of key frames in the heterogeneous network, this paper presents a quality scalable protection solution for the key frames in wavelet domain DVC. At the encoder side, the key frames are encoded by the traditional HEVC/H.265 (High Efficiency Video Coding) intra-frame coding and Wyner-Ziv coding based wavelet domain simultaneously. The HEVC bitstreams are transmitted to the wireless channel. The information bits are directly discarded for the WZ bistreams, and the generated parity bits are stored in buffer. To make the bit rate of the system adapt to different network conditions, different layers of low-frequency and high-frequency bands of the wavelet decomposition image can be combined into different enhanced layers. Initially, the decoder determines whether the HEVC bitstreams of the key frames are lost or not. If there is no error, the HEVC bitstreams are decoded to reconstruct directly, and the WZ parity bits in buffer will be deleted. On the contrary, the error concealment technique will be used to reconstruct a video frame of the received HEVC bitsreams. In addition, the reconstructed frame is accepted as the side information of the current key frame, and the decoder will request the WZ data of different enhancement layer according to the different channel environment. Moreover, the original frame and its corresponding side information roughly obey the Laplace distribution in the DVC system. Therefore, the real practice is to use the forward reference frame and side information to obtain the virtual noise model of the current frame because the decoder cannot obtain accurate original information. However, if the channel condition is limited and there are simultaneous errors in the key frames, then it is impossible to send the parity data of all enhancement layers. As a result, the quality of the reconstructed forward reference frame may be relatively poor and the estimation of the virtual noise model may have a large gap compared with the practical situation. Therefore, this paper improves the virtual noise model of the error key frames because of the similarity of the virtual noise model of the same layer in the wavelet decomposition image. With the decoded bands of the first enhancement layer and its corresponding side information, the more accord actual virtual noise model of the second and the third enhancement layer could be obtained. To validate the effectiveness of the proposed scheme, the luminance of three video sequences with different motion characteristics are simulated, including the ,, and sequences. The rate-distortion performance over packet loss channels with different random packet loss ratio[i.e., packet loss rate(PLR), PLR=(1%, 5%, 10%, 20%)] is evaluated. Experiments results show that in comparison with the traditional error concealment method, the proposed scheme can effectively improve the rate-distortion performance of the reconstructed video image under different channel conditions. Specifically, if only the parity data of the first enhancement layer are transmitted and the loss rate of key frames is 5%, the peak signal-to-noise ratio (PSNR) of the reconstructed video can be improved to about 25 dB. If the parity data of the second enhancement layer continue to be transmitted, the PSNR of the reconstructed video can also be increased from 0.51.6 dB. If all parity data of the three enhancement layers are transmitted, the decoded video can basically achieve the same quality of the key frames without errors. When the data loss ratio is relatively high, such as 20%, the quality of the reconstructed video by typical error concealment method nearly cannot meet the basic requirements. However, with the parity data of the first enhancement layer transmitted, the PSNR could be improved about 4.58.3 dB in the proposed scheme. If the parity data of the second enhancement layer continues to be transmitted, the PSNR could be also increased from 2.74.1 dB, if all parity data of the three enhancement layers are transmitted, the PSNR can also be increased from 3.7 4.6 dB. In general, the different reconstructed video quality could be obtained with the transmission of the different enhancement layers. Experimental results have indicated that the proposed error protection scheme for key frames in wavelet domain DVC can improve the robustness of key frames. The proposed framework can also improve the rate-distortion performance for different channel environments and requirements. However, the proposed scheme is based on the feedback channel, which causes some delay during decoding. Therefore, the rate estimation in the encoder side can be the next direction of research.  
      关键词:distributed video coding(DVC);key frame;wavelet domain;scalable;bit error channel   
      4102
      |
      704
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111381 false
      更新时间:2024-05-07
    • Shape-adaptive virtual hand-grasping method

      Hu Chen, Zhang Xuedan, Ma Huimin
      Vol. 22, Issue 5, Pages: 663-670(2017) DOI: 10.11834/jig.160590
      摘要:Virtual hand grasping is one of the core topics/techniques in virtual interaction, and it strongly determines the immersion. Given that real-time force analysis is complex, considerable rules that meet the characteristics of hand grasping are formulated instead of complex mechanic calculations. Some existing rules focus on the included angle of the normal vector of two contact points or the included angle of two lines that connect contact point and object center, but they cannot deal correctly with some shapes. This study proposes a shape-adaptive grasping method. This method uses the shape features and contact points of basic geometries, such as cube, sphere, and cylinder, to design grasping rules. 1) For a cube, at least three contact points that are not all collinear must be on the surface, and one of the included angles of the normal vector of every two contact points is larger than 90°. First, the positions of all contact points are determined. Second, the cube and all points are rotated; thus, every edge of the cube is parallel to the world axis, which simplifies the calculation. Third, whether the location of every point is on the plane, edge, or vertex of the cube is determined by the cube scale. Fourth, the normal vector of these points is calculated based on their locations. The vector direction is perpendicular to the plane when the point is on the plane or is the same as the moving direction of the point when the point is on the edge or vertex. Finally, the included angles of every two normal vectors are calculated, and the grasping result is determined. 2) For a sphere, at least four contact points exist, and the spatial relationship between these points and the sphere center should satisfy the proposition obtained from the analysis. First, the positions of the sphere center and all contact points are identified. Second, the center and every two of these points are used to form a plane, and whether all other points are on the same side of this plane is calculated. Finally, the grasping result is determined with the calculation results. 3) For a cylinder, depending on whether the contact points are all on the curved surface, a rule similar to sphere rule is used. First, the positions of all contact points are determined, and the cylinder and all the points are rotated so that the bottom plane of the cylinder parallels the world axis. Second, these points are projected to the bottom plane of the cylinder. Third, the circle center of the bottom plane and every projected point are used to form a diameter, and whether all other projected points are on the same side of the diameter is calculated. Finally, the grasping result is generated with the calculation results. When points exist on the top or bottom plane, the cube rule is used instead. This method can realistically handle the grasping of an object with a curved surface. For a complex object that can be composed by multiple basic geometries, our method divides the calculation into two steps. First, the object is decomposed into three basic geometries and then each geometry is calculated separately. Some geometries nearly meet the grasping rule, called unstable state. Second, our method selects each of these unstable state geometries, gathers information, such as the position and normal vector of all the points to it, and makes decisions on whether the entire object is caught with the basic geometry rules obtained before. Calculating the possibility of moment balance further reduces the occurrence of erroneous determinations. All normal vectors are the opposite direction of force, but their values are uncertain. The proposition obtained from the analysis implies that the moment may balance when the normal vectors meet a specific spatial relationship similar to the sphere rule. First, all normal vectors are normalized and projected to a unit sphere; thus, points exist on the surface. Second, the points are used to calculate the same proposition in sphere rule. Finally, whether the moment is balanced is determined. Only when both rules and moment balance are satisfied will the grasping be determined as a success. Experimental results show that the method can effectively handle the grasp of objects with curved surfaces, such as balls, and complex objects, such as cups, by using Unity3d software and Neuron Data Glove. When the virtual hand touches the upper hemisphere, the ball is not caught. Only if the figures ring around the ball will it be caught; casually grasping the cup will not catch it. Only when the gesture is correct and the moment is balanced can the cup be caught. This study proposes a shape-adaptive grasping method and a calculation method that decomposes complex objects into simple basic geometries. This method effectively handles the grasp and conforms to intuitive feel.  
      关键词:vitrual hand;basic geometry;grasping rule;collision detection;unstable state;moment balance   
      4274
      |
      389
      |
      6
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110809 false
      更新时间:2024-05-07
    • Vehicle detection method based on fast R-CNN

      Cao Shiyu, Liu Yuehu, Li Xinzhao
      Vol. 22, Issue 5, Pages: 671-677(2017) DOI: 10.11834/jig.160600
      摘要:The traditional vehicle target detection problem is typically divided into two steps:the first step is generating assumptions, that is, the image may exist in the vehicle target, thus reducing the need to calculate the area; the second step is verifying the hypothesis, that is, testing to verify whether there is a vehicle target in the image. In the first step, different features must be designed for different scenes. Among the features commonly used in vehicle detection problems are symmetry, color, shadows, corners, edges, textures, and lights. In the second step, verifying the hypothesis is typically based on the template method or on the appearance of the characteristics of the method. In addition to the above basic features, HOG, Harris, SIFT, and other manual features are also typically used. Finally, the test results are obtained through the support vector machine and other classifiers that classify the feature matrix. The whole process appears to be very detrimental to the generalization of detection problems; thus, it is necessary to select suitable characteristics for the case of unreasonable samples. This paper proposes a vehicle detection method based on Fast R-CNN, which can find vehicle objects in scene images. The method is based on the idea of deep learning convolution neural network. First, define the visual task using the vehicle image to be detected. The candidate region of the sample image is obtained by the selective search algorithm, and the candidate region coordinates are inputted to the network learning together with the visual task sample image. The sample image is calculated by the convolution layer and the pool layer in the deep convolution neural network. Finally, the deep convolution feature is obtained. The specifications of the sample image are not specified at the time of input, and the convolution characteristics obtained at this time are variable. Subsequently, the feature is normalized by the pooling region of the region of interest based on the Fast R-CNN network structure. Finally, the feature is inputted into different full-connection branches, parallel regression calculation feature classification, and detection frame coordinate values. After several iterations and training, the target detection model, which is strongly related to the specified visual task is obtained, and the trained weight parameters are trained. In a new scene image, a given type of vehicle target can be detected by the target detection model. The method uses a test dataset that is relative to the vision task to test the object detection model. The experiment suggests that it will achieve effective detection result by the vehicle detection model in the situation where the scenes of test samples are strongly correlated to the vision task. Conclusion First, determine the visual tasks that include the bus and car as the two categories. The background scene is the city road. Experimental results show that when the correlation between the test sample scene and the visual task is higher and the deformation of the vehicle target in the sample is smaller, the vehicle target detection model is obtained for the vehicle. Target detection has a good detection effect. The vehicle target detection method proposed in this paper uses the convolution neural network to extract the convolution feature instead of the traditional manual feature extraction process. Fast R-CNN obtained the vehicle target detection model with good effect on the visual task training defined by the sample image. The model can achieve well-performing vehicle target detection on new scene images that are strongly related to visual tasks. In this paper, the convolution characteristics are used to replace the traditional manual features in combination with the depth of learning convolution neural network to avoid the traditional detection problems in the feature selection problem. Deep convolution features have better expression ability. Based on the Fast R-CNN network, the vehicle detection model is obtained by several iterations. The detection model has a good detection effect on the visual task specified in this paper. This paper provides a more general and concise solution of vehicle detection problems.  
      关键词:fast region convolutional neural network(Fast R-CNN);deep learning;vehicle;vision tasks;object detection   
      8603
      |
      621
      |
      28
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110466 false
      更新时间:2024-05-07
    • Fast classification of natural scene and born-digital images

      Liu Guoshuai, Zhong Weifeng, Yin Fei, Liu Chenglin
      Vol. 22, Issue 5, Pages: 678-687(2017) DOI: 10.11834/jig.160597
      摘要:The rapid development of the Internet, smartphones, sensing, and communication technology, have resulted in the rapid increase of multimedia data on the Internet, such as texts, images, and videos, which provide rich information and great convenience to our life. By contrast, it becomes increasingly difficult to exploit the information embedded in the heterogeneous data. To effectively extract the contents embedded in web images, classifying the images into different types is beneficial so that the contents can be fed to different procedures for detailed analysis. In this paper, a hierarchical algorithm is proposed for the fast genre classification of natural scene images and born-digital images, which are the most prevalent image types on the Web. Our algorithm consists of two stages; the first stage extracts certain global features, such as coherence of highly saturated pixels, average contrast of edge pixels, and color histogram. All feature measures are designed based on distinct differences between natural scene images and born-digital images. Specifically, the coherence of highly saturated pixels focuses on measuring different patterns of color transitions from pixel to pixel appearing in the two types of images. Natural scene images often depict objects of the real world, and neither regions of constant color nor coherent pixels of highly saturated are common in this type of image because of the natural texture of objects, noise, and diversity of illumination conditions. By contrast, born-digital images tend to have larger regions of constant color and more blocks consisting of highly saturated pixels. The second measure describes the edge contrast. Typically, edges in born-digital images occur between regions of constant color and the transitions are very steep, while their counterparts in natural scene images often correspond to boundaries between objects of the real world and are much smoother for light variations and shading. We introduce color histogram as the third global measure considering that certain colors are much more likely to appear in born-digital images than in natural scene images. The global features are fed into a support vector machine (SVM) classifier to classify an image after extraction. Global features are not very discriminative for separating confusing images but are successful in capturing the difference of appearance between most common natural scene images and born-digital images. To this end, we introduce the second stage. Images assigned as low confidence by the first stage classifier are processed by the second stage, which extracts local texture features represented in the bag-of-words framework and uses another SVM classifier for final classification. In this stage, three types of local patches are exploited, namely, local smooth patch, local edge patch, and local random patch, and extracted corresponding texture features. The local binary pattern feature is used to describe the first two types of local patches and reduced color index histogram for local random patches. In comparison with global features, these local descriptors represent the different patterns of color transitions and properties of edges between two types of images in a more detailed way. All local descriptor vectors are quantized using locality-constrained linear coding (LLC) algorithm. A two-step clustering process is also adopted to achieve a more discriminative codebook. Initially, the k-means clustering algorithm finds certain sub-centers for each image in the training set, and then all sub-centers are gathered and clustered again to generate a final codebook. We build the codebook for each type of local descriptor individually, generate three representation vectors for each image in the bag-of-words framework, and concatenate them into a final local feature vector for the second classifier. In addition, two strategies are designed to train the second classifier and generate the final label in the second stage depending on the way the local feature is used. The first strategy is to train the second classifier in global and local feature space with all training samples and directly use its prediction label as the final classification result. In the second strategy, we only use local features to train the second classifier and then fuse the posterior probability vectors of two classifiers with different weight coefficients. The image is then categorized into the class, which has higher confidence according to the fused result. A database containing more than 30 000 images from various sources is also built to experimentally validate the effectiveness of our proposed method. The discriminative global and local features proposed in this paper are effective and efficient for classifying natural scene images and born-digital images. An overall accuracy of 98.26% and a processing speed of over 40 fps (frames per second) are obtained on our test image set on an Intel Xeon(R) (2.50 GHz). In our experiments, the proposed hierarchical framework could present comparable accuracy with direct classification using both global and local features but at faster speed. We also compare our method with a deep neural network, the (convolutional neural network) CNN-based method, which is very popular in the area of image classification recently. The selected CNN architecture is the typical LeNet-5. Experiment results show that our method is comparable with the CNN-based method in terms of classification accuracy but consumes much less computer memory, which means that our algorithm is faster in the case of using limited computational resources. In addition, the CNN suffers from heavy computation in both training and testing and is usually implemented using the GPU for parallel computation. Therefore, it is not suitable for fast genre classification of images, which are of huge numbers on the Web. In this paper, a fast classification algorithm is proposed to classify web images into two major types, namely, natural scene images and born-digital images. We likewise contributed a database containing over 30 000 images for future research. The hierarchical classification algorithm developed consists of two stages for a good tradeoff between classification accuracy and processing speed. It can likewise be used in large scale and real-time image based retrieval systems and other practical data-mining applications as an effective pre-filter model.  
      关键词:fast genre classification of images;feature extraction;bag-of-words;hierarchical classification method   
      4735
      |
      349
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111422 false
      更新时间:2024-05-07
    • Application of band energy adjustment in Fourier ptychographic microscopy

      Kuang Yawei, Duan Chaijie, Ma Hui
      Vol. 22, Issue 5, Pages: 688-693(2017) DOI: 10.11834/jig.160591
      摘要:Fourierptychographic microscopy (FPM) is an imaging technique for reconstructing high-resolution images using low-resolution images acquired from a set of different angles of incident light. This technique can bypass the resolution limit of employed optics. The FPM algorithm comprises two main theoretical bases. The first one is the phase retrieval technique, which was originally developed for electron imaging. This technique is used to recover the lost phase information using intensity measurements, and it typically consists of alternating enforcement of the known information of the object in the spatial and Fourier domains. The second one is the aperture synthesis. This technique was originally deve-loped for radio astronomy to pass the resolution limit of the single radio telescope. The basic idea of this technique is to combine images from a collection of telescopes in the Fourier domain to improve the resolution. By integrating the two techniques, the FPM can transform a conventional microscope into a high-resolution, wide field-of-view one. The difference between the low-resolution image and the high-resolution image in the frequency domain is reflected in the energy in the high-frequency band, and the high-frequency energy is abundant in the high-resolution image. However, the energy in the high-frequency band reconstructed by the former algorithm remains small. This study proposes a new iterative updating mode of FPM-band energy adjustment in FPM (BE-FPM) to solve the problem. This method is based on the energy distribution of Fourier space in high-resolution images. The entire iteration process for every image is divided into two steps. The first step conducts the recovery depending on the concepts of conventional FPM, which is to update the sub-region of the Fourier spectrum by the recorded low-resolution images. The second step is to use the new updating mode, namely, band energy adjustment in the iterative process. Energy distribution of a high-resolution image, which is calculated from a similar high-resolution sample, is applied as the prior. The Fourier spectrum is divided into several bands. Every band has different frequency ranges. The energy of each band is calculated and adjusted by the high-resolution prior. The reconstructed image is brought closer to the high-resolution image by adjusting the energy of different frequency bands. After the iterative process for one image, the process is conducted for every captured low-resolution image several times until the convergence is achieved. Experimental results on resolution board and bean hole data demonstrate that the BE-FPM further improves the resolution of the reconstructed image and can highlight the edge information. We conduct the experiments on resolution board and bean hole data. Compared with the updating mode used in embedded pupil function recovery for FPM (EPRY-FPM) and the BE-FPM updating mode, the BE-FPM mode can further improve the resolution of the reconstructed image and highlight the edge information. The element of group eight in the resolution board has a better and clearer reconstruction effect in the BE-FPM reconstruction result. The boundary of bean hole achieves a much clearer reconstruction by using the BE-FPM. Gaussian noise and salt-and-pepper noise are added to the originally captured low-resolution images to prove the robustness of the BE-FPM. A reconstruction of the noisy images using the EPRY-FPM and BE-FPM proves that the robustness of the BE-FPM for noise is better than that of the EPRY-FPM. This paper presents a new iterative updating mode of FPM, namely, BE-FPM. Experiments on resolution board and bean hole data show that the BE-FPM updating mode can further improve the resolution of the reconstructed image and highlight the edge information. The BE-FPM updating mode is more robust than the EPRY-FPM when the recorded images contain noise. In biological samples, numerous images have similar distributions, and these samples have similar energy distributions in the Fourier space. Therefore, the BE-FPM has potentials in reconstructing an entire sample using a partial high-resolution image and reconstructing samples in the same class via a single high-resolution image.  
      关键词:Fourier ptychographic microscopy;high resolution;image reconstruction techniques;energy distribution;Fourier space   
      3898
      |
      304
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111556 false
      更新时间:2024-05-07
    • Colon images registration based on the pre-match of haustral folds

      Guo Zhifei, Duan Chaijie, Liang Zhengrong
      Vol. 22, Issue 5, Pages: 694-701(2017) DOI: 10.11834/jig.160601
      摘要:Virtual colonoscopy is a method that scans through the 3D structure of a colon constructed by CT or MRI images and detects colon tissue, specifically, the polypus, which has great significance for early colorectal cancer screening. Registering colon images that have been acquired with different body positions can greatly improve the efficiency and accuracy of polyp detection in virtual colonoscopy. However, the traditional registration cannot accomplish this goal because of the drastic change in the supine and prone position of the patient, and the extraction of the feature points in the existing registration scheme does not take into account more special cases. Therefore, finding a new registration scheme of colon image is crucial. In this paper, a new method of colon registration that can complete the image registration acquired in different positions is proposed. Many previous works gradually flatten the three-dimensional (3D) colon to two-dimensional (2D) coordinates because of the large volume of 3D colon data. Registering 2D colon image directly is more convenient and easy to use for clinical application. Substantial work has been done to extract the effective feature points of the two images in registration. However, feature points acquired with previous methods cannot fully reflect the structural characteristics of the colon images obtained in different body positions, particularly the correlation between the two images. Haustral folds are remarkable and distributed in the entire colon; these folds are stable while the patient switches position. Therefore, the corresponding center of the fold can fully reflect the structural characteristics of the colon and is very suitable for colon registration. This paper proposes a new method to match the folds the two colon images and extract the center of matched folds as the feature point for registration. The complete registration scheme is as follows. First, extract the haustral fold that can reflect the information of fold structure. Gather the matched haustral folds using the method of template matching and feature matching and extract the center of matched folds as the feature points for image registration. Subsequently, do the non-rigid registration based in the center of matched haustral folds as coarse registration. The deformation between the two images is reduced in a more appropriate range after coarse registration. Finally, use b-spline registration based on gray value to complete the fine registration of the two pictures. This method can retrieve the drastic change in the colon image and make the change of the two images within certain limits. The traditional registration method can also be used as the feature point registration in previous works. Specific trained people can find out the two sets of matching colon folds because of the fold is remarkable in the colon. In the five sets of clinical data, about 62% haustral folds can be matched in comparison with the results of artificial matching. The rate of matching error is 4.7% because the center of matched folds is a significant feature point with high confidence in the two colon images and is very stable when patients switch position from supine to prone. The center of matched folds is more illustrative than other structural feature points in colon registration. The colonic deformation decreases after the rough registration of the folds, which makes the two colon images be registered using the traditional registration method. We accomplished the registration of colon images by using b-spline registration based on coarse registration. After registration, the gray absolute difference between the two colon images is clearly reduced. The set of colons that matches the high ratio of folds has the best registration accuracy. The gray value of the folds will have some differences because of the different angles of color rendering, and the absolute difference of gray image will still have a certain value after image registration. However, gray-scale difference can only reflect the global difference between the two colon images, but the structural features of the colon cannot be well evaluated. Therefore, the difference of polyp location detected by clinicians will be added to improve the evaluation method in the future. A new colon registration scheme is proposed in this paper. The drastic deformation between colon images caused by colon displacement and colon leakage makes the registration environment more complex and the previous feature points acquired unable to work well. Compared with the feature points using the previous method, the center of matched folds used as feature points can characterize the colon structure more significantly. The deformation of two colon images can be corrected when performing the rough registration of two colon images. We also completed fine registration using b-spline registration method based on coarse registration. In the present work, selecting the central points of the matched folds is still a more subjective process. Using all the center points may cause over registration and using less center points may also cause under-registration. Moreover, there will be more quantitative work on the final selection of feature points. However, the assessment cannot be completed simply using the difference of gray scale when performing registration evaluation. Therefore, others should be selected as criteria to improve the results of registration evaluation.  
      关键词:virtual colonoscopy;flatten;registration;haustral fold;matching   
      3508
      |
      339
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56111408 false
      更新时间:2024-05-07
    • Zhou Min, Shi Zhenwei, Ding Huoping
      Vol. 22, Issue 5, Pages: 702-708(2017) DOI: 10.11834/jig.160595
      摘要:Aircraft classification in remote-sensing images, that is, identifying the types of aircraft in remote-sensing images rapidly and accurately, is undoubtedly helpful for providing military information and taking military advantages. This method has extreme potential in research on the field of processing optical remote-sensing images. Various conventional machine-learning methods exist to solve this problem. However, these methods are difficult to apply to real optical remote-sensing images because of the complicated background. Features also need to be selected when using these methods, and the classification performance greatly depends on what features are extracted. Artificial selection of features, which is time consuming and complicated, is usually required to gain a relatively good result. Deep convolutional neural network has gained popularity in recent years. This network, which can learn features by itself and shows excellent generalization capability,has been widely used in the areas of computer vision and pattern recognition. However, at present, convolutional neural networks are rarely applied to this issue. This study aims to solve the aircraft classification problem in optical remote-sensing images using convolutional neural networks. Considering the lack of public dataset of aircraft in optical remote-sensing images, this study collects eight types of aircraft from optical remote-sensing images to form a dataset, including boomers, carriers, fighters, primary trainers, and tankers. The number of each aircraft is equal. The dataset is divided into training and testing sets, with the ratio between the sizes of these two at 4:1. The samples are randomly selected into training or testing dataset. For each aircraft type, the ratio between the number of training and testing datasets remains the same. The training set is largely augmented with three methods when training because of the large need for data of the convolutional neural network. The final training set is as large as 32 times that of the original training set. This study designs a special five-layer convolutional neural network for aircraft classification in optical remote-sensing images based on the theory of deep convolutional neural network. The sizes of convolutional and pooling kernels used in shallow layers are as small as 3×3 pixels to adapt to the specialty of aircraft classification in remote-sensing images, such as the lack of data and the low absolute resolution. Initially, this convolutional neural network is trained individually on different training sets, which are augmented in varying degrees. In addition, well-trained convolutional neural networks are tested using the same test set. The result shows that the test accuracy can be improved from 72.4% to 97.2% by augmenting the training set. Second, this paper trains and tests the aforementioned five-layer convolutional neural network on the aircraft dataset. This convolutional neural network is also individually trained on original and augmented training dataset to guarantee that data augmentation is necessary. Moreover, the gained networks work on the same test dataset. According to the aforementioned experiment result, the network trained on the largest training dataset performs best. Therefore, data augmentation is suitable and necessary, and the dataset mentioned in the following experiment is augmented. After establishing the dataset, this paper selected one type of classical conventional machine learning method for comparison to thoroughly verify the feasibility of the convolutional neural network. Furthermore, LeNet-5, the first utilitarian convolutional neural network, is used to identify the eight types of aircraft for comparison. The three methods used the same dataset in the training and testing phases. The experimental result shows that the classification accuracy obtained as high as 97.2% with the five-layer convolutional neural network design. By contrast, the accuracies with the other two algorithms are 82.3% and 88.7% respectively. Therefore, the designed five-layer convolutional neural network performed better than the classical conventional machine learning method and LeNet-5. This result is attributed to the fact that the conventional machine learning method requires image segmentation, which is difficult to perform in complicated scenarios. Moreover, LeNet-5 is primarily used for digital number recognition rather than aircraft classification. Thus, LeNet-5 is relatively smaller and not quite suitable for this issue. This paper designs a five-layer convolutional neural network specifically for aircraft classification in optical remote-sensing images, in which deep convolutional neural network is incapable to see. The experiments show that this convolutional neural network can effectively acquire the aircraft features and classify with high accuracy.  
      关键词:optical remote sensing images;aircraft;classification;deep learning;convolutional network   
      6307
      |
      595
      |
      21
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56110591 false
      更新时间:2024-05-07
    0