
Ranking
- Current Issue
- All Issues
- 1
Indoor scene segmentation based on fully convolutional neura...
281 - 2
Adaptive zero-watermarking algorithm based on boost normed s...
253 - 3156
- 4
Parallax image stitching using line-constraint moving least ...
138 - 5
Traffic video significance foreground target extraction in c...
138 - 6
Sample optimized selection of hyperspectral image classifica...
132
- 1
Adaptive zero-watermarking algorithm based on boost normed s...
51 - 2
Indoor scene segmentation based on fully convolutional neura...
40 - 333
- 4
Traffic video significance foreground target extraction in c...
29 - 5
Reduction of the redundancy of adjacent bit planes for rever...
20 - 6
Ocean eddy recognition based on multi-feature fusion in comp...
18
- 1131073
- 213460
- 312316
- 411721
- 511489
- 6
A Survey on Image Segmentation Using Active Contour and Leve...
10959
About the Journal
Journal of image and Graphics(JIG) is a peer-reviewed monthly periodical, JIG is an open forum and platform which aims to present all key aspects, theoretical and practical, of a broad interest in computer engineering, technology and science in China since 1996. Its main areas include, but are not limited to, state-of-the-art techniques and high-level research in the areas of image analysis and recognition, image interpretation and computer visualization, computer graphics, virtual reality, system simulation, animation, and other hot topics to meet different application requirements in the fields of urban planning, public security, network communication, national defense, aerospace, environmental change, medical diagnostics, remote sensing, surveying and mapping, and others.
- Current Issue
- Online First
- Special Issues
Image Processing and Cording
- Adaptive zero-watermarking algorithm based on boost normed singular value decomposition Xiao Zhenjiu, Jiang Dong, Zhang Han, Tang Xiaoliang and Chen Hongdoi:10.11834/jig.180443
08-01-2019
253
51
Abstract:Objective Parameter β is the most commonly used anti-attack scaling ratio in boost normed singular value decomposition (BN-SVD). However, it requires numerous experiments to obtain and has randomness. Thus, an adaptive zero-watermarking algorithm based on BN-SVD was proposed. Using this parameter presents three advantages. First, the singular value of the image is enlarged, the sensitivity of the image to attacks is reduced, and the robustness of the algorithm is improved to some extent. Second, singular values are limited to a certain range. The diagonal distortion problem can be solved by equalizing the grayscale in the diagonal direction. Third, a singular value vector is specialized, and the corresponding relation between a singular value vector and an image is specialized to one, such that singular values can represent the features of the image. Thus, the problem of false alarm error is solved. Method First, the original image was divided into non-overlapping blocks. Then, slant transform (ST) was performed on each block matrix. BN-SVD was used on each block matrix after ST to achieve a maximum singular value, and a feature vector was created by comparing the maximum singular value with the average maximum singular value. The watermarked image was processed by Arnold transformation and logistic mapping to obtain an encrypted and scrambled double-encrypted watermarked image. Finally, the zero watermark was constructed using the feature vector, and the double-encrypted watermarked image was used for XOR operation. During optimization, parameter β was determined by training and updating continuously through the BAS fitness function. Similar to genetic algorithm, particle swarm optimization, and so on, the proposed algorithm does not need to know the specific function form and gradient information. The optimization process can be realized independently, and its characteristics were single individual, less computation, and faster optimization speed. The algorithm was inspired by beetle search behavior. The biological principle is as follows:the beetle relies on the strength of the food smell to find food. Two antennae were randomly used to search nearby areas. When the antennae on one side detected a higher concentration of odors, the beetles turn in that direction. According to this simple principle, the beetle can effectively find food. Results Under JPEG compression, rotation, filtering, clipping, and other attacks, the normalized coefficients of the extracted watermarked images and the original watermarked exceeded 98%. Lena, Baboon, and Bridge were selected as the original grayscale image, and two different sizes of "Liaoning Technical University" were chosen as binary watermarking images. Several sets of experiments were conducted. In the experiment, a normalized correlation coefficient (NC) was used to analyze the similarity between the original watermark and the extracted watermark, and the optimal parameters β for the 16×16 pixels and 32×32 pixels watermarked images were found by the BAS optimization algorithm. The optimum parameter β values of the three gray images of Lena, Baboon, and Bridge were 0.298 3, 0.642 4, and 0.533 2 for the 16×16 pixels watermarked images and 0.737 0, 0.991 4, and 0.873 5 for the 32×32 pixels watermarked images. The experimental results revealed that with the increase in attack intensity and mixed attacks, the NC value of the watermark is affected. However, most NC values exceeded 0.99. The NC value of the watermark extracted after geometric attacks, such as clipping and rotation, was close to 1. Given that the original gray image was rotated, and some pixels were lost in the clipping process, the watermark generated was incomplete. A larger compression attack parameter corresponded to a larger NC value, indicating that the algorithm had better resistance to JPEG compression. For all kinds of noise attacks, the NC value of the extracted watermark can exceed 0.99. Conclusion BAS algorithm can be used to adaptively determine parameters β in BN-SVD. The optimal scale of scaling enhanced the singular value of the image and reduced the sensitivity of the image matrix when attacked. The problems of diagonal distortion and false alarm error caused by singular value decomposition were solved effectively, and the robustness of the watermarking algorithm was improved. Compared with other traditional optimization algorithms, the BAS algorithm presented the advantages of short training time, fast convergence speed, and good robustness. By integrating the concept of zero watermark, the contradiction between robustness and invisibility of the watermark is solved. Thus, the robustness of the watermarking algorithm was improved.
- Reduction of the redundancy of adjacent bit planes for reversible data hiding in encrypted images Yuan Yuan, He Hongjie and Chen Fandoi:10.11834/jig.180305
08-01-2019
94
20
Abstract:Objective For the existing reversible data hiding algorithm in encrypted images, the correlation between bit planes is not fully utilized when the bit plane is compressed. To reduce the compression ratio of bit planes and improve the embedding capacity, this study proposes a reversible data hiding algorithm in encrypted images to reduce the redundancy between adjacent bit planes. Method The proposed algorithm is mainly divided into the following steps:image block scrambling, image preprocessing, exclusive encryption, additional information embedding, information extraction, and image decryption/recovery. The proposed algorithm initially divides the image into non-overlapping blocks (such as 4×4) and scrambles the position of these blocks. The value of the bit plane does not change because the scrambling operation does not change the pixels of the image. In this way, the correspondence between the bit planes is unaffected, and compression can be performed based on the correlation. XOR operation is performed on adjacent bit planes because the binary images between the bit planes of the original image are similar. After the image is divided and the block is scrambled, XOR operation is performed on the highest-order and second-highest bit planes to obtain a new second-highest bit plane. Next, the new second-highest bit plane XOR is combined with a lower one to obtain the third-highest bit plane. The same operation is performed on the remaining lower bit planes to obtain a new lower seven bit planes, which combines the original highest bit plane to obtain the eight bit planes of the new image. The compression method of a binary-block embedding (BBE) algorithm identifies the binary image and records the structural information of the block. The BBE algorithm remarkably increases the embedding rate. The bit plane can be regarded as a binary value, thus, the compression method has been applied to grayscale images and achieved good results. The BBE algorithm has good effect on marking the block and storing structure information; thus, the proposed algorithm in the present work is also applied to this compression method. We use the BBE algorithm to compress the bit plane of an image after an adjacent bit plane XOR operation to accommodate the embedded information. To ensure the security of the encrypted image, the image that vacates the room through compression is XOR-encrypted by using an encryption key. The additional information that does not exceed the maximum embedded capacity are encrypted and embedded in the vacated space to ensure their security. The extraction and embedding of information are reverse processes. Therefore, if the corresponding key is obtained, then the information can be extracted without loss. Information extraction or image decryption/restoration is based on the key held by the user. If the user has an encryption key, then the image can be decrypted or the image can be completely restored. After complete restoration, XOR operation should be initially performed on the image. Then, the high-order data are restored by the BBE algorithm, and XOR operation is performed on the adjacent bit plane to obtain a restored image that is consistent with the original image. Image decryption and information extraction are two independent processes, which realize the separation of the algorithm. If the user has the data encryption key, then the data embedded in the image can be extracted without loss. Result Experiments have shown that the XOR operation of adjacent bit planes makes the lower bit planes smooth except for the highest bit plane. This operation reduces the number of incompressible blocks and compresses poor blocks in bit planes. It is also conducive in compressing the images with the BBE algorithm. Image compression can be improved; thus, considerable space can be freed to embed additional information. In comparison with several existing algorithms for reversible data hiding in encrypted images based on bit-plane compression, the average embedding rate of the proposed algorithm is increased by 0.4 bit/pixel for images with different textures. This finding illustrates the effectiveness of the proposed algorithm in improving the embedding capacity. The XOR operation of adjacent bit planes not only increases the capacity but also has a smaller time complexity than that the BBE method used in grayscale images. In the literature, this operation increases the efficiency of bit-plane coding since prediction and XOR operation making the bit plane smoother. The embedded information can be extracted without loss, and the image can be completely recovered, thereby realizing the complete reversibility of the algorithm. Conclusion Experimental results show that the proposed algorithm provides considerable spaces to embed additional information and ensures their security, which enables the flexibility to embed information based on the requirements. Generally, the proposed algorithm has good performance, and the XOR operation of adjacent bit planes can be applied to several bit-plane processing algorithms. In the future, our focus will be on the application of the algorithm in real life. In cloud storage, a certain amount of user space and transmission traffic is wasted when the cloud manager is not required to embed considerable data for some reversible information hiding algorithms with high embedding rate. The solution to these practical problems will be investigated in our future work. In this way, the algorithm of reversible data hiding in encrypted images can be suitable in real-life applications.
- Parallax image stitching using line-constraint moving least squares Fan Yiqing, Li Haisheng and Chu Dongdongdoi:10.11834/jig.180343
08-01-2019
138
12
Abstract:Objective Image alignment is a key factor in assessing stitching performance. Image deformation is a critical step of the alignment model for parallax image stitching and directly determines the alignment quality. Accurately aligning all points in an overlapping region of parallax images is difficult. Thus, an alignment strategy that can produce visually satisfying stitching results must be developed. Recent state-of-the-art stitching methods practically combine homography with content-preserving warping. Either homography is first used to pre-align two images and is followed by content-preserving warping to refine alignment, or the mesh deformation is globally optimized by solving an energy function, which is a weighting linear combination of homography and content-preserving warping. Both approaches commonly use homography in the aligning phase and therefore easily produce perspective distortion. At the same time, these approaches possibly misalign the object edges of images with several dominant structural objects. To address these problems, this paper presented a novel stitching method that combines homography, deformation using moving least squares (MLS), and line constraint. The deformation method based on MLS has an interpolation property and can therefore accurately align matching feature points. However, this deformation method may distort structural regions; thus a line constraint item was added to the deformation model to preserve the structure. Method To attain a clear depiction, we considered a two-image stitching as an example. The two input images are called the target and reference images, respectively, and are denoted by T and R. First, feature detection and matching estimation were conducted using SIFT and RANSAC, followed by distance similarity to check the matching accuracy of the feature points. The homography (denoted by H) with the best geometric fit was selected. Then, H was applied to the target image T, and the transformed image was denoted as TH. Afterward, the two group images (T, R) and (TH, R) were aligned using a line constraint MLS. To eliminate perspective distortion in the deformation image, affine transformation was used in MLS. However, a simple affine transformation was insufficient to handle the parallax. Thus, an additional pair of images (TH, R) was processed as a candidate stitching result for the pair of images (T, R). The test experiments revealed that many examples obtained a more natural stitching result when only affine transformation rather than the composite transformation of homography and affine transformation was applied, implying that the alignment between T and R was better than that between TH and R. Taking the deformation from the target image T to the reference image R as an example, the line constraint MLS was outlined as follows. First, the four corner points of T were deformed to the coordinate system of R by using matching feature points as control points based on MLS. Then, we deformed the remaining points on the four border lines (top, bottom, left, and right boundaries) of T by using line constraint MLS. Here, the line constraint was constructed by preserving the relative position of each point of a border line, based on which a deformation objective function was developed. Similarly, we handled the internal points of T by using vertical and horizontal grid lines as constraint conditions, and the vertical and horizontal grid lines are consisted of the constraint lines of their intersection point. Finally, the quality of each alignment was evaluated, and the best one was chosen to blend them. In the overlapping regions, the max-flow min-cut algorithm was used to find the best stitching seam-cut of two alignments and assess the alignment quality along the seam-cut. The assessment of the alignment quality mainly considered the color and structural differences between overlapping regions of two images, and the structure was reflected by a gradient. Then, feathering approach was utilized to blend the two images of the best alignment. Result To test our stitching algorithm, 23 pairs of pictures, which cover commonly seen natural and man-made scenes, were captured. In addition, we conducted several experiments on publicly published data provided by recent related works. The experimental results demonstrated that the alignment accuracy of our method exceeded 95%, and the ratio of perspective distortion was lower than 17%. Compared with recent state-of-the-art methods, our method's alignment accuracy was higher by 3%, and the ratio of perspective distortion was lower by 73%. Therefore, our method exhibits a better performance in handling image stitching with a large parallax, and the stitching result is authentic and natural. Conclusion This paper presented a hybrid transformation for aligning two images that combines line constraint with MLS. In addition, an alignment quality evaluation rule was introduced by computing the weighted differences of the points along the stitching seam-cut and the remaining points in the overlapping region. As the proposed method can balance alignment accuracy and structure preservation, it can address the misalignment issues easily caused by current stitching approaches for parallax images and effectively reduce stitching artifacts, such as ghosting and distortion.
Image Analysis and Recognition
- Ocean eddy recognition based on multi-feature fusion in complex environment Huang Dongmei, Liu Jiajia, Su Cheng and Du Yanlingdoi:10.11834/jig.180324
08-01-2019
104
18
Abstract:Objective Ocean eddy recognition has become one of the hotspots in the field of physical oceanography research. The rapid and continuous change of ocean eddies brings great challenges to their accurate recognition research. On the one hand, the ocean environment that causes the ocean eddies is complex and variable. On the other hand, ocean eddies undergo rapid and continuous change, which cause dramatic changes in the morphological structure and motion state during their movement. With the development of high-aging and large-scale Earth observation technology, it provides a rich data foundation for the research on the rapid and continuous change of ocean eddies, which greatly promotes research on ocean eddy recognition. However, traditional manual visual recognition methods have significant limitations. It is an impossible task to artificially recognize ocean eddies one by one in a large data set, and the manual visual recognition of ocean eddies is influenced by subjective judgment. Traditional methods have significant uncertainty in ocean eddy recognition results due to subjective difference, which form unstatistical errors. Therefore, the use of computer technology to automatically recognize ocean eddies is important. Ocean eddies leads the ocean water to gather or scatter, causing obvious changes on surface roughness. The irregular spiral structure in SAR images shows bright or dark stripes. It has rich texture features and contour, which provides an abundant feature of the ocean eddy recognition information. Traditional ocean eddy recognition methods are mainly based on the physical parameters, vector geometry, or the hybrid of these two methods with specific threshold value. These methods can achieve good results in certain ocean areas. The selection of threshold is mainly influence the recognition accuracy of the traditional method largely. In addition, the morphological structures of ocean eddies greatly vary under different ocean states. Moreover, the complex and variable environment causes the rapid and continuous change of ocean eddies. Therefore, it is difficult to determine the suitable threshold in traditional method. Methods that set thresholds based on expert knowledge are subjective and uncertain, which often lead to omission, misjudgments, and lack of generality. To solve these problems, we propose an automatic ocean eddy recognition method with generalization ability based on multi-feature fusion in complex environment. Method Our method includes data-preprocessing, feature extraction, multi-feature fusion, and classifier training. First, the dataset, including randomly clip, scale transform, and rotation transform, is extended through data-preprocessing to improve the robustness of our method. We derived our data set from SAR images generated by the ENVISAT and ERS-2 satellites between 2005 and 2010. In this paper, 136 SAR images with and without ocean eddies are included in the data sets. In actual applications, the construction of large-scale data sets requires high labor costs, especially the construction of ocean eddy data sets. Moreover, it has high requirements for data set builders, and the difficulty of construction the data set is intensified. Adequate and diverse data sets are the key to the recognition algorithms in the field of image recognition. Therefore, we use the data augmentation method to extend the data set before the ocean eddy recognition method. Second, the gray-level co-occurrence matrix (GLCM), Fourier descriptors (FD), and Harris corners features are extracted. Ocean eddies in SAR images have abundant features, such as shape, texture, and color characteristics, but a single feature is not enough to accurately recognize them because of complicated ocean state, weather changes, equipment disturbance, and various interference. GLCM can represent the comprehensive information of the image on the direction, interval, change range, and speed. FD is related to the size, direction, and starting point of the shape according to the properties of the Fourier transform. In areas with rich texture information, the Harris operator can extract many useful feature points, whereas in areas with less texture information, fewer feature points are extracted. It is a relatively stable point feature extraction operator. Then, we extract the averaged GLCM feature and utilize principal component analysis (PCA) to reduce the features dimensions of FD descriptors and Harris corners. In the calculation of the GLCM feature, the value of the direction considers four conditions at 0°, 45°, 90°, 135°, representing east-west, northeast-southwest, south-north, and southeast-northwest, respectively. The dimensions of the FD and Harris are reduced using PCA, because their features are too high, the learning time of classifier will be very long, and the classification ability is decreased. Third, the processed feature vectors are serial fused. Finally, the recognition of ocean eddies is achieved by the classifier. In this paper, 10-fold cross-validation is used to test the accuracy of the algorithm. Simultaneously, three typical classification algorithms are adopted including support vector machine), decision tree (DT), and multi-layer perceptron. Result The results indicate that the accuracy of the proposed method based on multi-feature is higher than that based on single feature methods, and the recognition accuracy of the DT classification algorithm is the highest, reaching 86.904 5%. The PCA dimensionality reduction method can effectively improve the recognition accuracy. The FD and Harris feature dimension are reduced by PCA, and their recognition accuracy is improved from 83.906 0% to 86.904 5% and 84.009 7% to 85.354 7%, respectively. Moreover, their robustness to a variety of morphological changes of ocean eddies is good. Conclusion This method can be used to distinguish whether the SAR images have ocean eddies. It fuses three kinds of image features, including GLCM, FD, and Harris corners, which effectively overcomes the shortcomings of traditional methods based on threshold setting and single feature and improves the recognition accuracy of ocean eddies based on SAR images. Hence, multi-feature fusion improves the recognition accuracy to a certain extent compared with single-feature recognition. Our method is suitable for the recognition of ocean eddies in complex environment and has generality.
- Level set method based on local connection operator and difference operator for segmenting texture images Zhou Li and Min Haidoi:10.11834/jig.180435
08-01-2019
126
14
Abstract:Objective Images with complex texture features are difficult to segment. Many methods, such as Gabor filter and structure tensor, have been proposed to segment texture images. However, these methods present a number of drawbacks. For example, methods based on Gabor filter must set multiple scales and directions, and the desirable scale and direction parameters are difficult to select. Methods based on structure tensor must set two fixed directions to analyze intensity variations. However, many texture images include irregular direction features. Moreover, utilizing only the intensity variation information is an inadequate strategy. Morphology similarity and direction uncertainty often exist in nature texture images. Existing methods cannot be utilized to accurately segment nature texture images. Method Considering the morphology similarity and direction uncertainty of local texture patterns, we proposed the local connection and difference operators to extract the texture features of images. On the one hand, by setting a threshold for each local region, the local connection operation can be used to analyze the intensity distribution and texture morphology features. On the other hand, by considering the intensity variation trait of texture images, the local difference operation can be utilized to extract the local intensity variation. The local connection operation and the local difference operation are complementary. Then, by combining the local similarity and local difference features with the intensity information, the level set energy function was constructed and further minimized to obtain the final segmentation results. The main advantages of the proposed method can be summarized as the following two points. First, the morphology feature was proposed to analyze the local intensity distribution feature of texture images. The intensity distributions of texture images are uncertain for each object region. Thus, extracting the morphology feature of a local region is of great significance. Second, the two proposed operators are complementary. Using only one feature to segment complex texture images is a challenging task. The morphology and intensity difference features can jointly segment images, suggesting that the proposed method can be robust for different texture images. Result We verified the effectiveness of the two operators by comparing it with other operators, such as Gabor, structure tensor, extended structure tensor, and local similarity factor. Specially, we first analyzed the extraction effects of the morphology feature by comparing the proposed local connection operator with methods based on Gabor filter, on structure tensor and extended structure tensor, local similarity factor. The extracted feature of the local connection operator can be used to efficiently discriminate the object and background regions. Then, the local difference operator was used to compare the proposed approach with traditional texture feature extraction methods. The local different operator can accurately extract the local intensity variation. In addition, we conducted three comparison experiments to confirm whether the proposed method can achieve a better segmentation effect than the methods based on Gabor, structure tensor, extended structure tensor, and local similarity factor. Finally, we tested the robustness of the proposed method for different initial contours and images. The segmentation accuracy of our proposed method exceeded 97%, which was considerably higher than that of other methods. That is, the proposed operator is more effective in discriminating the different texture patterns of images. Conclusion We proposed two complementary operators, namely, the local connection operator and the local difference operator, to effectively extract texture features. The two extracted features are complementary and can be jointly utilized to segment texture images. Finally, we validated that the proposed method can obtain better segmentation results for natural texture images.
Image Understanding and Computer Vision
- Traffic video significance foreground target extraction in complex scenes Lang Hong, Ding Shuo, Lu Jian and Ma Xiaolidoi:10.11834/jig.180313
08-01-2019
138
29
Abstract:Objective In urban traffic detection, the wide application of intelligent video surveillance provides the visual research interest on artificial intelligence and advanced computer vision technology to retrieve and recognize the foreground object in video and its further analysis, such as feature extraction and abnormal behavior analysis. However, when facing complex environment, the discontinuity of the dynamic background causes loss of a small part of the future target image information, false detection, and misjudgment. Constructing an effective and high-performance extractor has two core issues. The first issue is the detection of speed and efficiency. If the video object can be extracted in advance and can determine which video frames do not contain the foreground object, it is directly eliminated in the earlier period, only concerning the image with a significant foreground target, which greatly improves the detection efficiency, because of the large video data. The second problem involves the object integrity in complex environments. Effectively extracting the foreground part of the video sequence becomes the key to the reliability of subsequent algorithms. Method This paper proposes a robust principal component analysis (RPCA) optimization method. The classical RPCA detection method uses the l0-norm to independently determine whether each pixel is a moving target and is not conducive to eliminate unstructured sparse components due to noise and random background disturbance. This paper aims to maintain the good robustness of the algorithm in the complex environment and optimize the RPCA initial filtered image. In order to quickly screen and track the foreground target, a fast extraction algorithm for the saliency target frame number is designed based on the frame difference Euclidean distance method to determine the detection range in the key frame neighborhood. Through the establishment and the solution of sparse low-rank models and based on the initially filtered foreground target image, parallel recognition of the foreground target seed is performed to remove the dynamic background in the foreground target image. Also, as observed from several mask images after gray value inversion, the foreground target pixel has a small gray value and strong directionality. Therefore, the design ideas for the parallel recognition and optimization connection method of the foreground target seed are:1) By using gray pixel seed recognition, gray value inversion of the source image, and verification according to the gray scale and symmetry detection, grayscale pixels are identified as foreground and non-foreground target sub-blocks; 2) Grayscale pixels are optimized for connection, and foreground target seeds are connected according to grayscale values and directional similarity, followed by fusion and multi-template denoising; 3) Seed filling is used for foreground targets to enhance the connectivity and make the target more complete. Simultaneously, the foreground objects in the mask image are classified into regular and irregular class. For the fault separation of irregular targets such as pedestrians and animals, the vertical seed growth algorithm is designed in the target region. For the foreground targets of rules such as car and ships, the foreground seed in the design region is vertically and horizontally connected to remove the holes and the impact of the lack of structural information. Result The foreground target extraction is highly robust in complex environment with challenging interference factors. In the four groups of classic video of the database and the two videos of Shanxi Taichang Expressway, the dynamic background has the flow of water, swaying leaves, the slight jitter of the camera, and the change of light shadows. In addition, the experimental results were analyzed from three perspectives of the application effect, the accuracy of foreground target location, and the integrity of foreground target detection. The significance of target extraction algorithm has achieved an average accuracy of 90.1%, an average recall of 88.7%, and an average F value of 89.4%, which are all superior to other similar algorithms. Compared with the mixed Gaussian model and the optical flow algorithm, the complex background brings a large noise disturbance. The Gaussian mixture model uses a morphological algorithm to remove the noise filling holes, giving the detected foreground target more viscous information. At different shadow area, the detection effect varies greatly. Furthermore, the optical flow algorithm is sensitive to light, and the changed light is mistaken for optical flow, which is not suitable under strict environmental requirements. Conclusion In this paper, by quickly locating the salient foreground, a parallel seed identification and optimized connection algorithm for RPCA initial screening image is proposed. The qualitative and quantitative analyses of the experimental data show that the algorithm can separate the foreground target from the dynamic background more quickly, reduce the adhesion between the foreground object and the background, and more effectively retain the structural information of the foreground object in the original image. In the following studies, deficiencies in the overall model and the algorithm details are continuously optimized. In the face of abnormal light rays, shadow suppression can be combined to make it more robust, and the performance and effectiveness of the algorithm are improved in more complex environments such as drone mobile video, which provides data support for feature extraction and abnormal behavior analysis.
- Indoor scene segmentation based on fully convolutional neural networks Huang Long, Yang Yuan, Wang Qingjun, Guo Fei and Gao Yongdoi:10.11834/jig.180364
08-01-2019
281
40
Abstract:Objective Vision is one of the most important ways by which humans obtain information. Visual prosthesis refers to the process where electrodes are implanted into a blind body to stimulate the optic nerve, such that the blind can see hallucinations. Therefore, the objects felt by the blind are only the general features, such as low resolution and poor linearity. In some cases, the blind can hardly distinguish optical illusions. Before the electrodes were stimulated, image segmentation was adopted to display the general position and outline of objects to help blind people clearly recognize every familiar object. The image fast segmentation of the convolution neural network was proposed to segment the indoor scene of visual prosthesis in terms of its application features. Method According to the demand of visual prosthesis for real-time image processing, the fast fully convolutional network (FFCN) network structure proposed in this paper was improved on the AlexNet classification network structure. The network reduced the error rate of top five in the ImageNet dataset to 16.4%, which was better than the 26.2% of the second. The AlexNet uses the convolution layer to extract deep feature information, adds the structure of the overlapping pool layer to reduce the parameters that must be learned, and defines the Relu activation function to solve the gradient diffusion of the Sigmod function in deeper networks. In contrast to other networks, it presents characteristics such as light weight and fast training speed. First, the FFCN for image segmentation in the indoor scene was constructed. It was composed of five convolution layers and one deconvolution layer. The loss produced by the continuous convolution in the picture feature information was avoided by scale fusion. To verify the effectiveness of the network, a dataset of basic items that can be touched by the blind in an indoor environment was created. The dataset was divided into nine categories and included 664 items, such as beds, seats, lamps, televisions, cupboards, cups, and people (XAUT dataset). The type of each item was marked by grayscale in the original image, and a color table was added to map the gray image into pseudo-color map as the semantic label. The XAUT dataset was used to train the FFCN network under the Caffe framework, and the image features were extracted using the deep learning feature and scale fusion of the convolution neural network to obtain the segmentation model in the indoor scene for adapting to the visual prosthesis for the blind. To assess the validity of the model, the fine adjustment of traditional models, including FCN-8s, FCN-16s, FCN-32s, and FCN-8s at-once, was examined. The dataset was used to obtain the corresponding segmentation model in the indoor scene for adapting to the visual prosthesis for the blind. Results A comparative experiment was conducted on the Ubuntu16.04 version of the Amax Sever environment. The training time of the model lasted for 13 h, and a training model was saved every 4 000 iterations. The tests are tested at 4 000, 12 000, 36 000, and 80 000 iterations. The pixel recognition accuracy of all kinds of networks exceeded 85%, and the mean IU was above 60%. The FCN-8s at-once network had the highest mean IU (70.4%), but its segmentation speed was only one-fifth of that of the FFCN. Under the assumption that the other indicators differed insignificantly, the average segmentation speed of the FFCN reached 40 frame/s. Conclusion The FFCN can effectively use multi-layer convolution to extract picture information and avoid the influences of the underlying information, such as brightness, color, and texture. Moreover, it can avoid the loss of image feature information in the network convolution and pool through scale fusion. Compared with other FCN networks, the FFCN has a faster speed and can improve the real-time image preprocessing.
- Multi-type flame detection combined with Faster R-CNN Hui Tian, Halidan·Abudureyimu and Du Handoi:10.11834/jig.180430
08-01-2019
156
33
Abstract:Objective Flame detection can effectively prevent the occurrence of fire. Traditional image processing techniques for current flame detection methods have low anti-interference ability and generalization, and the detection effect is highly sensitive to data fluctuations. Machine learning must set and extract the characteristics of a suitable flame on the basis of different scenarios, and this process is complex. To avoid complex artificial feature extraction and ensure good detection accuracy in the presence of complex backgrounds, lighting changes, and various forms of flame images, a multi-type flame detection method based on Faster R-CNN was proposed. Method This method is based on deep learning and uses convolutional neural networks to automatically learn to acquire image features. First, visual tasks were established using self-built datasets. According to the sharp angle characteristics of the fire, the visual shape, and the amount of smoke, the flame data were divided into three types, namely, single point, multi-point, and shapeless flames. In addition, in-depth network feature visualization experiments revealed that the artificial light source and the flame have similarities in contour. Thus, two datasets of artificial light sources (circular and square) were established as interference items to ensure the stability of the detection model. Then, the training parameters were refined, and the pre-trained convolutional neural network structure was adjusted. The classification layer was modified to satisfy specific visual tasks. The image features abstracted by the convolutional and pooling layers in the deep convolutional neural network were sent to the region proposal network for regression calculation, and the corresponding detectors for each type of target object were obtained by transfer learning strategy. Finally, the target detection model based on the visual tasks was obtained, and the weights and bias parameters were saved. The sub-detectors of various target objects in parallel were used as the overall detector. The scores of various detectors were output during the detection, and the highest score was considered the correct detection item. Result First, the trained detectors and the corresponding test dataset were used for testing. Then, the test sets of various targets were utilized to test the detection effect of other types of detectors to prove the mutuality between detectors. The experiments demonstrated that all kinds of detectors had a high specificity, which greatly reduced the possibility of misjudgment, and good detection accuracy for flame images with sharp deformations and complex backgrounds. The detection model obtained through training achieved good results when dealing with difficult situations, such as small targets, multiple targets, various forms, complex backgrounds, and lighting changes. The results showed that the average accuracy of various types of detectors increased between 3.03% and 8.78%. Conclusion The proposed flame detection method subdivided the flame category by excavating the visual morphological characteristics of the flame and using the deep convolutional neural network instead of manual feature setting and extraction process. By combining self-built datasets and network models that were modified based on visual tasks, a multi-type flame detection model with good detection results was obtained. By using the concept of deep learning, the tedious artificial feature extraction is avoided, and a good detection effect, is achieved. In addition, the model has a strong anti-interference ability. This article provides a more general and concise solution to the problem of flame detection.
Computer Graphics
- Shape adjustable transition curves with arbitrary parameter continuity Yan Lanlan, Fan Jiqiu and Zhou Qihuadoi:10.11834/jig.180231
08-01-2019
83
6
Abstract:Objective The existing research has failed to provide a general expression of the polynomial potential function, which can enable the transition curve to reach Ck (where k is an arbitrary natural number) continuity at endpoints. This research aims to solve this problem in a simple and effective manner. Method First, by using the equation of the transition curve, the kth derivative of the transition curve is obtained with the help of the Leibniz formula. According to the predetermined continuity goal, the basic conditions, which should be met by the potential function to enable the transition curve to reach Ck continuity at endpoints, are deduced. Second, according to the total number of conditions contained in the basic conditions and those corresponding to other expectations of the potential function and transition curve, the degree of the polynomial potential function is determined. The potential function is expressed as a linear combination of the Bernstein basis functions with the same degree, and the combination coefficients are established. Finally, according to the basic and other expected conditions to be satisfied by the potential function, as well as the function and derivative values of the Bernstein basis functions at endpoints, an equation set is obtained for the undetermined coefficients. To solve the equation set, the unified expression of the polynomial potential function, which satisfies all expected goals and contains a free parameter, should be obtained. Result Two parameters exist in the potential function, namely, k and λ. Parameter k is used to control the continuity order between the transition and initial curves at the endpoints. After k is determined, parameter λ can be used to control the degree of proximity between the transition and initial curves. The potential function has symmetry, midpoint property, and boundedness. The monotonicity of potential function with respect to the variable t and the parameter λ is analyzed when k is fixed. The value range of the free parameter, which depicts the curve of the potential function and has a unique inflection point, is analyzed. For the general parameter values, the transition curve that is constructed by the potential function can reach Ck continuity at the endpoints. For the special parameter values, the transition curve can Ck+1 reach continuity. The shape characteristics of the transition curve are further analyzed. When the value of k is set, the greater the value of λ and the closer the transition to the initial curve. Conclusion The numerical examples verify the correctness of the theoretical analytical results and the effectiveness of the proposed method.
- C2 Hermite interpolation based on quartic rational parabolic-PH curves by using Möbius transformation Guo Yu, Jiang Ping, Wang Jianmin and Liu Zhidoi:10.11834/jig.180318
08-01-2019
117
7
Abstract:Objective The offset curve, also known as the parallel curve, refers to the locus of points along the normal vector direction with distance d. In recent years, the offset curve has played an important role in many fields and is widely applied in computer-aided geometric design (CAGD). In general, the arc length and offset curve of the polynomial curve have no rational form, and the offset-rational (OR) curve is a special polynomial parameter curve with exactly rational offset curves. The special properties of the curve have attracted the attention of many researchers. In recent years, the interpolation problem of OR curves has been widely studied. The problem of curve interpolation is widely used in many modern industrial fields, such as robot design, machinery industry, and space industry. The Hermite interpolation of given endpoint is a common method to construct a curve in CAGD. The C2 Hermite interpolation problem of the quartic parabolic-PH curve, which is an even order of offset rational curve is discussed in this paper. Method Based on the parameters introduced by M bius transformation, a bijective linear fractional transformation, the C2 Hermite interpolation of quartic rational parabolic-PH curve is constructed through complex analysis. The data HC2={R0,R1,T0,T1,V0,V1} to be interpolated is given with R0 and R1 referring to the two end points, T0 and T1 for the tangent vectors at R0 and R1, and V0 and V1 for the second tangent vectors at R0 and R1. By appropriate transformation, rotation, and scaling, making R0=0 and R1=1, we can further obtain the interpolation conditions for C2 curves after M bius transformation. This paper shows a concrete construction method of quartic rational parabolic-PH curves for C2 Hermite interpolation, whose tangents have three orders. By supposing the expression of r(t),F(t),G(t), the first-and second-order derivative of the curve can be obtained. The corresponding expression of the control points and the Bézier curve can be obtained by using the integral relation formula. The exact value of the parameter are calculated by the C2 Hermite interpolation condition of the curve. Then, the quartic rational parabolic-PH curves formed by the M bius transformation are finally constructed. Result By providing a set of "reasonable" endpoints to be interpolated, we can obtain 12 C2 Hermite interpolation curves from the transformed quartic polynomial parabolic-PH curve under the initial interpolation condition and further obtain the C2 Hermite interpolation curves of the 12 quartic rational parabolic-PH curves under the initial interpolation condition. Numerical examples show the effectiveness of the algorithm. It is not clear and convenient to choose the appropriate interpolation curve from the 12 interpolation curves. We need to select the curves that satisfy the interpolated condition and can elastically handle the inflection points. Other interpolation curves may have cusp points, node points, closed loops, or obviously inconsistent with geometric design requirements. By combining the minimum absolute rotation number and the elastic bending energy minimization, the selection method for determining the optimal curve satisfying the interpolation condition is put forward. When the absolute rotation number and the elastic bending energy of the interpolated curve are minimized, the optimal curve is often obtained, which has better smoothness and natural shape that meet the needs of geometric design. The examples illustrate that the traditional quartic parabolic-PH curve can construct C1 Hermite interpolation curves. However, the constraint of interpolation condition does not allow the direct construction of a curve with higher continuity. For traditional quintic PH curve, we cannot directly construct a curve with a continuity higher than C1, whereas through M bius transformation, we can achieve C2 Hermite interpolation, which has a higher continuity than the traditional method. For the same set of given data, we construct the C2 Hermite interpolation curve from quintic rational PH curve and quartic rational parabolic-PH curve. Compared with the 18 quintic rational PH curves, we can achieve the optimal curve from the 12 quartic rational parabolic-PH curves with lower elastic bending energy. Hence, the quartic rational parabolic-PH curves constructed by our method have more natural geometry than the traditional quintic rational PH curves. Although parabolic-PH curves with eight degree can be used to construct C2 Hermite interpolation curves, the solution is complex, and the computation is large. Hence, through analysis and comparison, the quartic rational parabolic-PH curve presented in this paper has a simpler computation than quintic PH curves and parabolic-PH curves with eight degree. The interpolation results of the quartic rational parabolic-PH curve is more obvious, and the optimal curve best meets the requirement for the geometric design. Conclusion The use of C2 Hermite interpolation of quartic rational parabolic-PH curves constructed by the introduction of M bius transformation not only ensures low degree of interpolation curve but also ensures a higher continuity of interpolation conditions. It makes the calculation simpler and the interpolation effect more obvious compared with the traditional PH curve with odd number of order. Related research on the sub PH curve is of certain significance. This report is significant for the study of PH curves with even number of degree.
Medical Image Processing
- Multi-channel diffusion tensor imaging registration method based on active demons algorithm by using variable parameters Zhao Jie, Xu Xiaoying, Liu Jing and Du Yuhangdoi:10.11834/jig.180281
08-01-2019
97
8
Abstract:Objective Diffusion tensor imaging (DTI) is widely recognized as the most attractive non-invasive magnetic resonance imaging method. DTI is sensitive to subtle differences in the orientation of white matter fiber and diffuse anisotropy. Hence, it is a powerful method studying brain diseases and group research, such as Alzheimer's disease, Parkinson's disease, and multiple sclerosis. DTI registration is a prerequisite for these studies, and its effect will directly affect the reliability and completeness of the follow-up medical research and clinical diagnosis. DT images contain many information about the direction of brain white matter fibers. DTI registration not only requires the consistency of the anatomy between the reference and the moving image after registration but also demands consistency between the diffusion tensor direction and the anatomic structure. The DTI registration based on demons algorithm, which uses the six independent components of the tensor as inputs, can fully use the direction information of the diffusion tensor data and improve the quality of registration. However, this algorithm does not perform well in the large deformation area, and its convergence speed is slow. The active demons algorithm can accelerate the convergence to some extent, but the internal structure of the moving image is prone to being teared, deformed, and folded due to the presence of false demons force, which can alter the topological structure of the moving image. To solve these problems, this paper proposes a multi-channel DTI registration method based on active demons algorithm by using variable parameters. Method The active demons algorithm is introduced into the multi-channel DTI registration. By analyzing the influence of the homogeneous and the balance coefficient in the active demons algorithm on the DTI registration and combining the advantages of the balance coefficient of improving the convergence speed and that of homogeneous coefficient of enhancing the accuracy of the multi-channel DTI registration, an appropriate homogeneous coefficient is first manually selected in a reasonable range. Then, the size of the balance coefficient value is dynamically adjusted with the decreasing Gaussian kernel during the convergence of this proposed algorithm. A smaller balance coefficient is used in the initial stage of DTI registration for a faster convergence speed, and then the balance coefficient is gradually increased for a smaller registration error. To verify whether the proposed multi-channel DTI registration method based on active demons algorithm using variable parameters statistically improves the effect of registration compared with the demons and active demons methods, 10 pairs of DTI data volumes of patients with Alzheimer's disease are used for registration. The mean square error (MSE) and overlap of eigenvalue-eigenvector pairs (OVL) obtained from the three DTI registration methods are used for the paired t test. Result When the demons algorithm is used for the multi-channel DTI registration, a good registration effect is achieved in small deformation areas. However, the registration effect in larger deformation areas is not ideal and the convergence rate is slow. The homogenization coefficient in the active demons method for DTI registration resolved the registration problem in large deformation areas, but the image topology will change if the homogenization coefficient is too small. Although a faster convergence can be achieved by fixing the homogenization coefficient and introducing a single balance coefficient, the topological structure of the image changes simultaneously. Compared with the DTI registration method based on demons and active demons algorithm by using multiple channels, the convergence speed of the proposed approach is increased, the registration effect in large deformation areas is significantly improved, and the topology consistency of the image is preserved before and after registration. Moreover, the minimum MSE and maximum OVL values are obtained after registration using the proposed method for 10 sets of DTI data. At the given level of significance of 0.05, a significant difference can be found in the MSE values and OVL values between the active demons algorithm using variable parameters and active demons algorithm and between the active demons algorithm using variable parameters and demons algorithm (p<0.05). Conclusion The application of variable parameters in the proposed DTI registration method not only effectively improves the registration accuracy and registration speed but also enhance the registration of large deformation areas of DT image by the demons algorithm. It maintains the topological structure of DT images before and after registration simultaneously, which is one of the major drawbacks in multi-channel DTI registration method based on active demons algorithm. The experimental results indicate that a multi-channel DTI registration method based on active demons algorithm using variable parameters is suitable for the registration of DT images with large deformation areas between individuals.
- Combining principal component analysis network with linear discriminant analysis for the classification of retinal optical coherence tomography images Ding Sijing, Sun Zhongyang, Sun Yankui and Wang Yonggedoi:10.11834/jig.180084
08-01-2019
108
14
Abstract:Objective Optical coherence tomography (OCT) is a 3D scanning imaging technology that has been widely used in ophthalmology as a clinical auxiliary to identify various eye lesions. Therefore, the classification technique of retinal OCT images is greatly important for the detection and treatment of retinopathy. Many effective OCT classification algorithms have been recently developed, and almost all these have artificial design features; however, retinal OCT images acquired from clinic usually contains a complex pathological structure. Therefore, the features from OCT images must be directly learned. Principal component analysis network (PCANet) is a simple version of convolutional neural network, which can directly extract the texture features of images, whereas features extracted by linear discriminant analysis (LDA) are more distinguishable for image classification. Combining the advantages of these two methods, this paper presents a PCANet with LDA (PCANet-LDA) for the automatic classification of three types of retinal OCT images, including age-related macular degeneration (AMD), diabetic macular edema (DME), and normal (NOR). Method The proposed PCANet-LDA algorithm adds an LDA supervisory layer based on the PCANet to allow the supervision of extracted image features by class labels. This algorithm can be implemented in three steps. The first step is the OCT image preprocessing, which involves a series of preprocessing including perceiving, fitting, and normalizing stages on retinal OCT images to obtain an interested retinal region for image classification. The second step is the PCANet feature extraction, where the preprocessed OCT images are sent into a PCA convolution layer with two stages and a nonlinear output layer. In the PCA convolution layer, PCA filter banks are learned, and the PCA features of retinal OCT images can be extracted. In the nonlinear output layer, the extracted PCA features are translated to PCANet features of the input images by some basic data-processing components, including binary hashing and blockwise histograms. The third step is the LDA supervisory layer, which uses the LDA idea to learn an LDA matrix from the PCANet features with class labels of AMD, DME, and NOR. Then, the LDA matrix is used to project PCANet features into a low-dimensional space to make the projected features more distinguishable for classification. Finally, the projected features are used to train a linear support vector machine and classify the retinal OCT images. Result Both experiments are done on two retinal OCT dataset, including the clinic dataset obtained from a hospital and Duke dataset. First, the comparative examples of AMD, DME and NOR retinal OCT images before and after preprocessing shows that the image preprocessing cuts out the non-retinal regions in the OCT image, leaving the meaningful retinal areas. Moreover, the remaining retina is rotated to a unified horizontal state to reduce the impact of inconsistent direction of retina on classification. Then, the sample PCANet feature maps extracted from AMD and DME retinal OCT images show that the PCA filter trained by PCANet tends to capture meaningful pathological structure information, which contributes to the classification of retinal OCT images. Finally, the correct classification rates of the PCANet algorithm, the ScSPM algorithm, and the PCANet-LDA algorithm proposed in this paper are compared. On the clinic dataset, the overall correct classification rate of the PCANet-LDA algorithm is 97.20%, which is 3.77% higher than that of the PCANet algorithm and slightly higher than that of the ScSPM algorithm. On the Duke dataset, the overall correct classification rate of the PCANet-LDA algorithm is 99.52%, which is 1.64% higher than that of the PCANet algorithm and a slightly higher than that of the ScSPM algorithm. Conclusion The PCANet algorithm can extract effective features. Accordingly, the PCANet-LDA algorithm obtains more distinguishing features by LDA method, to yield a higher correct classification rate than that of the PCANet and ScSPM algorithms; the latter is a state-of-the-art two-dimensional OCT image classification of the retina. Therefore, the proposed PCANet-LDA algorithm is effective, advanced in the classification of retinal OCT images, and can be a baseline algorithm for retinal OCT image classification.
- Classification method of vectorization characteristics of pulmonary nodule surface Liu Tong, Xu Jiuqiang, Zhu Hongbo, Meng Zhaoyan and Dou Shengchangdoi:10.11834/jig.180317
08-01-2019
78
10
Abstract:Objective In this paper, the spherical surface texture and nodule shape are vectorized through spherical harmonics and repulsive mapping algorithm for the benign and malignant determination of pulmonary nodule in chest CT (computed tomography) images. The current methods of deep learning during benign and malignant screening of pulmonary nodules neglect data-preprocessing while focusing more on framework improvement. So far, the depth-learning method is mainly oriented to feature information, which can be vectored. In image-processing, two kinds of targets are mainly included in two-and three-dimensional processing. In two-dimensional processing, the existence of input data must be an equal-length problem. Considering that the obtained size of the pulmonary nodules is different, we must compress large-scale images in the input process and stretch the small-scale images, which will undoubtedly affect the quality of feature extraction and the final classification results. In the classification of CT nodules for the three-dimensional treatment of pulmonary nodules, the CT image angle is more stringent due to the different sizes of pulmonary nodules and the uncertainty of growth position, and the angle factors are uncontrollable during actual CT shooting. Hence, if we insist on the characteristics of the convolution channel, it is necessary to give priority to the solution. So we have to standardize different pulmonary nodules. Different from traditional pulmonary nodule classification method, the proposed method aims on how to vectorize the spherical texture and nodule shape to allow input of data to the depth forest for feature classification training. Method First, the three-dimensional pulmonary nodule images are produced by three-dimensional reconstruction of the data from the Affiliated Hospital of Liaoning University of Traditional Chinese Medicine. The data is divided into training and test set at 8:2 ratio. Second, the spherical harmonic function and the repulsive mapping algorithm are used to map the texture to the standard sphere in a mesh manner while preserving the spatial information. Third, the texture features of the pulmonary nodules are calculated by the method of mesh-LBP (local binary pattern) and vectorized by the construction ring. Then, the shape energy loss is constructed by the distance difference between the center of gravity and the central point of the pulmonary nodule during reconstruction of the three-dimensional pulmonary nodules, and the shape energy loss of the lung nodules is extracted and vectorized during rebuilding. Finally, a mesh-based multi-grain scanning method is proposed to improve the deep forest training framework by constructing multi-scale concentric ordered rings instead of traditional multi-grain scanning methods. The vectorized texture and shape features are added to the improved deep forest training framework for experimental verification. Result The pulmonary nodule data of this trial was mainly provided by the IRB (institutional review committee) of the Affiliated Hospital of Liaoning University of Traditional Chinese Medicine. The approval notice number is 2017111CS(KT)-040-01. According to the growth of pulmonary nodules, the dataset was divided into four types:independent standing lung nodules, and ground glass-shaped, vascular adhesion, and thoracic adhesion type pulmonary nodules. The benign and malignant lung nodules divided into five levels:1-benign, 2-suspected benign, 3-unknown, 4-suspected malignant, and 5-malignant. A total of 7 326 three-dimensional lung nodules were extracted as the data set for this experiment. According to several experimental results, this algorithm have a better performance than the existing advanced methods (Variational Auto-Encoder, Faster Region Convolutional neural network, GcForest deep forest, Deep Convolutional Generative Adversarial Networks, and Extreme learning machine) under the following indices:accuracy (ACC), specificity (SPE), sensitivity (SEN), and AUC (area under the receiver operating characteristic ROC curve) values. In the experiments, ACC, SPE, SEN, and AUC reached 76.06%, 69.46%, 88.46%, and 0.84, respectively. Accuracy indicates the proportion of the classifier's correct classification of positive and negative examples, which can measure the judgment of the classifier on the sample. Specificity refers to the true negative rate, which indicates the proportion of the classifier to the correct classification of the counterexample and can measure the ability of the classifier to identify the counterexample. Sensitivity refers to the true positive rate, which indicates the proportion of correct classification of positive examples and can measure the ability of the classifier to identify positive examples. It also indicates the area under the ROC curve. The ROC curve can dynamically display the classification result. The true positive rate (sensitivity) is the ordinate, whereas the false positive rate (1-specificity) is the abscissa. A value closer to the upper left corner indicates a stronger classification ability of the method. To accurately measure the ROC curve, it can be represented by the AUC value, which is generally between 0.5 and 1. A larger value indicates a stronger classification ability. Comparatively, the method used in this paper is sub optimal for the SEN value but the method used in this article maximizes SPE and ACC values.The AUC value is also sub optimal. Conclusion Lung cancer have one of the fastest growing morbidity and mortality rates worldwide and is a serious threat to people's health and life. The incidence of malignant pulmonary nodules is concealed. Many patients are often in the local late stage or even distant metastasis at the time of diagnosis, thus losing the chance for cure. Therefore, the early detection of malignant pulmonary nodules is crucial for the chance of successful cure. Based on the spherical harmonic function and the tolerance rejection mapping algorithm, the two surface and shape of pulmonary nodules can be successfully vectored and trained. Aside from considering data pretreatment, the accuracy of benign and malignant detection of pulmonary nodules is improved by these two features. Our proposed method can solve the problem of angle normalization by using a spherical space mesh to obtain the nodular information through a spherical space mesh. In addition, this method can also obtain different texture features and shape edge feature vectors as classification features by using the loop method. It is also an effective method for feature extraction and is vectorized in three-dimensional models.
- Sample optimized selection of hyperspectral image classification Fang Shuai, Zhu Fengjuan, Dong Zhangyu and Zhang Jingdoi:10.11834/jig.180437
08-01-2019
132
9
Abstract:Objective In recent years, an increasing number of applications have been realized for remote sensing images. Hyperspectral image classification is a widely used method for hyperspectral image processing. For the traditional high-spectral classification problem, the mainstream improvement methods were compared to optimize the classifier or the classification algorithm. This approach does not address the existing limitations; thus, we proposed new improvements on the basis of this problem. In other words, while improving the classifier, the sample space was optimized to obtain a group of representative training samples to ensure the overall classification accuracy. The traditional classification method considers only the improvement of the classification algorithm and the classifier and adopts various random selection methods for sample acquisition. The spectral properties of even the same kind of substances are different. In view of the differences in the in-class spectral characteristics, the conventional random sample selection strategy for random sample selection in a certain proportion of various substances cannot guarantee that the selected training samples can contain complete spectral features. To solve this problem, a sample space optimization strategy based on in-class reclustering was proposed. In this way, the selected training samples can be guaranteed to contain various spectral curves of each class, and the selected samples are uniformly distributed in each subclass of each class. Moreover, to further improve the classification accuracy, the classifier must also be improved. According to integrated learning, the accuracy of point classification with the same classification results for multiple classifiers is higher, whereas the error rate with deviations is higher. Therefore, for the low-confidence classification result with low classification accuracy, the high-confidence region with high accuracy was optimized again. In this paper, the optimization method was a correction strategy based on neighborhood high-confidence information, which optimized the classification results of the low-confidence region by obtaining the classification results of the high-confidence region to improve the accuracy of the results of low-confidence region and the overall classification accuracy. Given that the classification strategy used in this paper is point classification, the domain information was not considered. In fact, the category information at a certain point was the same as the category information in the same region affected by the neighborhood information. Therefore, we used the edge protection filter to smooth various types of information based on edge protection to ensure the similarity between the information of a certain point and the field information and to further improve the classification accuracy. Method In this paper, fuzzy C-means clustering algorithm was used to implement class-to-class clustering on each class of samples. As the spectral characteristics of the same kind of samples are different, the same kind of samples were grouped into several subclasses according to the difference in spectral characteristics. When selecting samples, we ensured that samples were selected from each subclass of each class to ensure that the sample covered the entire sample space. For the correction strategy of the neighborhood high-confidence information, this paper used edge protection filter to optimize the low-confidence information by using the high-confidence region information. First, two simple classifiers, namely, support vector machine (SVM) classifier and sparse representation-based classifier (SRC), were used to test the consistency of the classification results of the two classifiers. The point set with consistent classification results was the high-confidence region, and the point set with inconsistent results was the low-confidence region. Then, the results of the low-confidence region were optimized by using the edge protection filter. First, the hyperspectral images were processed by principal component analysis to obtain the first principal component. Given that the first principal component contained most of the structure information of the image, the first principal component was used as the guide diagram. Then, the high-confidence region was filtered, and a small number of the low-confidence region point sets were propagated by the high-confidence information. In this way, the low-confidence region can obtain a new category information to replace the original low-confidence category information to modify the low-confidence region classification results. The edge-protecting filter has the feature of edge-preserving smoothness. The aforementioned strategies demonstrated that our classification effect was greatly improved. In addition, even when a small proportion of training samples was selected, the samples can be trained with a high classifier after the sample space was optimized to ensure the stability of the classification accuracy. Result The experiments used three sets of experimental data, namely, the India Pines, Salinas, and PaviaU datasets. Then, two sets of experiments were set up according to the different sample selection proportion to compare the classic and the proposed classification algorithm. In the first experiment, we selected 10%, 1%, and 1% training samples. The experimental results revealed that the overall accuracy (OA) values of the three datasets reached up to 98.93%, 99.78%, and 99.40% respectively, which was 1% higher than that of other optimal algorithms. In the second small-scale sample experiment, we set the sample proportions to 1%, 0.3%, and 0.4%. The OA values for the India Pines, Salinas, and PaviaU datasets reached up to 90.48%, 99.68%, and 98.54%.The OA values of the three groups of data were 4%-6% higher than that of other algorithms for the same proportions. The experimental results suggested that the proposed algorithm is superior to other algorithms, and that the classification accuracy was greatly improved, particularly in experiments with small sample proportions. Conclusion In this paper, a representative and balanced sample was selected through sample space optimization strategy to ensure that the classification accuracy remained significant for small-scale samples. The correction strategy based on neighborhood high-confidence information offered a good optimization effect. Moreover, this algorithm adapted to many datasets and achieved good robustness. In summary, the results showed that the reduction of sample proportion resulted in the rapid decline and instability of the classification effect of the traditional classification algorithm. The proposed algorithm offers obvious advantages, which ensures not only the high accuracy but also the stability of the classification results.
- Multi-layer perceptual decomposition based full reference image quality assessment Li Guoqing, Zhao Yang, Liu Qingmeng, Yin Xiangyu and Wang Yenandoi:10.11834/jig.180438
08-01-2019
109
6
Abstract:Objective IQA (image quality assessment) is one of the fundamental research topics in the fields of computer vision and image processing. Traditional quality assessment methods are mainly based on low-level visual features and generally ignore high-level semantic information. Traditional IQA methods mainly rely on single pixel intensity or low-level visual features, such as image contrast, image edges, etc., to assess images. PSNR (peak signal-to-noise ratio) is a basic and commonly used tool for directly comparing the differences of pixel intensities between the test image and the reference image By contrast, human visual systems extract structural information from visual scenes. The PSNR cannot accurately measure the subjective visual quality. To extract the structure information and attain a better evaluation, various kinds of improved IQA methods have been proposed. Many methods first decompose an image into different aspects to extract information that effectively measures image quality. However, these traditional methods still omit the high-level semantic information. With the rapid development of deep learning algorithms, high-level semantic information can be effectively extracted by deep networks. Given their special hierarchical structure, deep networks can analyze and understand images in different levels. In recent years, perceptual loss based on deep network has been widely used in many computer vision applications, such as image style-transfer, non-photorealistic rendering, image restoration, etc. By utilizing a pre-trained deep network to decompose an image into different semantic levels, satisfactory results can be produced for related tasks. Inspired by the perceptual loss, we proposed a multi-layer perceptual decomposition-based full-reference image quality assessment method. Method First, a pre-trained deep network was used to decompose the input image and extract the multi-layer feature maps. Many pre-trained deep networks could be employed for this purpose. On the basis of previous studies on perceptual loss, the VGG-19 network was selected because of its effectiveness. VGG-19 is composed of several different layers, including the convolutional, activation function, pool, dropout, fully connected, and softmax layers. These elements are stacked in a specific order to form a completed network model. This network has been widely applied because it can achieve impressive results in many recognition tasks. To reduce complexity, several layers were set as the abstraction layer for extracting feature maps. Second, the proposed method calculated not only the similarity between the test image and the reference image but also the similarity between their multi-level feature maps. The feature maps at the lower level can reflect the differences of the image in the edge, detail, texture, and some low-level features, whereas the feature maps at the higher level can reflect the saliency and semantic differences of the image in the region of interest. Finally, an image quality score that considered the similarity of high-level semantic was obtained. Compared with existing DNN (deep neural network)-based IQA methods, the pre-trained deep network was merely utilized to decompose the image rather than to fit the subjective mean opinion scores. Thus, the proposed method did not need to train a new IQA network in contrast to other DNN-based methods. Moreover, the proposed method was an open and elastic framework that improved the performance of traditional methods by extracting additional high-level semantic information. Therefore, numerous traditional full reference IQA methods can be further improved by exploiting the proposed framework. In this paper, a number of typical and efficient traditional IQA methods were improved and evaluated by proposed method. These IQA methods included the PSNR, the SSIM (structure similarity), and its two effective variants, namely, MS-SSIM (multi-scale structure similarity) and FSIM (feature similarity). Other full reference IQA methods can also be improved by the proposed semantic decomposition-based framework. Result The experimental data were derived from the TID2013 dataset, which includes 25 reference images and 3 000 distorted images. Compared with other existing databases, TID2013 has more images and distortion types, guaranteeing more reliable results. The experimental results of the selected traditional methods, namely, PSNR, SSIM, MS-SSIM, and FSIM, showed that the proposed method can effectively improve the performance of traditional image quality assessment methods and achieve corresponding improvements in many objective criteria, such as SRCC (Spearman rank order correlation coefficient), KRCC (Kendall rank order correlation coefficient), PLCC (Pearson linear correlation coefficient), and RMSE (root mean squared error). The SRCC indicators were increased by 0.02, 0.07, 0.06, and 0.04 for PSNR, SSIM, MS-SSIM, and FSIM, respectively, on the TID2013 dataset. SRCC and KRCC measure the predicting monotonicity. PLCC is calculated to predict accuracy. RMSE is used to predict consistency. These traditional assessments can attain higher SRCC, KRCC, and PLCC values by using the proposed method. For the RMSE, the proposed methods can achieve much lower results than those of the corresponding conventional IQA methods. In addition, the results for different distortion types demonstrated that the proposed method can effectively improve the performance. Conclusion This paper proposed a full-reference image quality assessment method based on perceptual decomposition that combined the benefits of traditional methods and deep learning methods. By simultaneously considering the low-level visual features and high-level semantic information, the proposed method effectively improved the evaluation performance of traditional methods. By incorporating the additional high-level semantic information, the IQA results became more consistent with the subjective visual perception. Furthermore, the proposed evaluation framework can also be applied to other traditional full reference IQA methods.