摘要:ObjectiveWatermarking algorithm experiments are usually used as a standard to determine intensity parameters. However, the experimental workload is considerable and stochastic. A watermark is embedded in the original image to coordinate invisibility and robustness. The watermarked image has good invisibility but reduces the watermark robustness. A good watermark hiding technology must consider both the invisibility of the watermarked image and the robustness of the watermark and can resist various common attacks at the same time. An adaptive watermarking algorithm based on image block is proposed to balance invisibility and robustness.MethodScale-invariant feature transform (SIFT) is utilized to extract the original image with a robustness feature as good that of the watermark-embedded area. Before the embedding operation, feature points are filtered to eliminate the points with low contrast and edge point, and the resulting feature points have high stability. Two watermarking experiments with different watermark sizes are conducted. The extraction of the number of feature points is positively correlated with watermark size. When the watermark is extracted, the points are used again to locate the watermark. The feature points of the original carrier image are extracted to form a matrix with the same size as that of the watermarked image. The extracted embedded region is divided into four equal and non-overlapping image blocks. Each image block is decomposed by singular value decomposition (SVD) to obtain two orthogonal matrices U and V and a diagonal matrix S. Matrix S is superimposed by the subband of the first-class discrete wavelet transform of the watermark, which is embedded in the encrypted watermark. The watermark matrix is reorganized, and then the feature points are restored to the original image. The size of the embedded strength affects the performance of the extracted watermark. If the embedding intensity is defined as a constant, then it cannot be applied to each experiment type. Random values are also affected by the embedding effect of the watermark. The fitness function of the fruit ${\rm fly} $ optimization algorithm (FOA) is used in adaptive embedding of a watermark image. The senses of smell and sight of fruit flies are superior to those of other species. When flies recognize the smell of food, they ${\rm fly} $ in the direction of a target or partner. The FOA simulates this behavior. Two sets of watermarks with different sizes are embedded in this study. Two groups of objective function are selected through several experiments to determine the watermark embedding strength and to achieve a good embedding effect. The FOA is used to balance the invisibility and robustness of the watermark. The watermark detection can directly affect images after attacks without correction.ResultMulti-standard experiments are conducted on standard gray images. Standard gray images of Lena, Baboon, and Plane are selected as the original images, and two binary images of "word" and "Liaoning Technical University" are used as watermark images. Peak signal-to-noise ratio (${\rm PSNR}$) is used to measure the imperceptibility of the watermark. A high ${\rm PSNR}$ indicates high imperceptibility. Normalized correlation coefficient (${\rm NC}$) is adopted to evaluate the similarity between the original and extracted watermarks. A large ${\rm NC}$ value means high robustness of the watermark. For a 16×16 watermark image, the optimal embedding intensities of the three gray images, Lena, Baboon, and Plane, are 0.302 7, 0.349 6, and 0.377 3, respectively. The ${\rm PSNR}$ of the watermarked image is above 45 dB. For a 32×32 watermark image, the optimal embedding intensities of the three gray images, Lena, Baboon, and Plane, are 0.240 1, 0.251 1, and 0.217 6, respectively. The ${\rm PSNR}$ of the watermarked image is above 43 dB. These values show that the algorithm achieves good invisibility. When the algorithm is not attacked, the extracted watermark images have ${\rm NC}$ values above 0.99, and the watermark robustness is high. External attacks should be resisted. The three gray images with watermark are simulated by applying noise, compression, shear, and rotation. The 16×16 watermark image under different attacks presents watermark ${\rm NC}$ values above 0.94. The attacks exert minimal influences on the quality of watermark extraction. The rotation, translation, and other geometric attacks on the 16×16 watermark image lead to ${\rm NC}$ values below 1. The watermark image loses some pixels in the rotation and translation processes. Therefore, in the process of watermark extraction, all pixels cannot be extracted, thereby resulting in an incomplete watermark. For compression attacks, the resulting ${\rm NC}$ values increase as the attack parameters increase. In the noise attack, the ${\rm NC}$ values of watermark obtained under different intensity attacks can reach more than 0.98, which shows that the algorithm has a great advantage in resisting noise attack. The embedding of the 32×32 watermark image exhibits a similar watermark effect when attacked by the same attacks, and the general trend of ${\rm NC}$ values under attacks is the same. The two groups of attack experiments show watermark ${\rm NC}$ values greater than 0.94, which indicates that the algorithm can effectively resist some bad attacks.ConclusionThe feature points extracted by SIFT have a certain stability, and the local characteristics of watermarked images are affected by various geometric attacks, such as rotation and shear. The location of a watermark embedded in the algorithm is a part of the feature points extracted from the original image and is scattered. As a result, the watermark is not visually significant. SIFT is adopted to realize locally embedded image blocks. The SVD algorithm optimizes the watermarkembedding performance to extract stable feature points. The FOA allows each experiment group to yield adaptive intensity parameters and achieve the best embedding effect without the need to compare the parameters. This algorithm intelligently searches for optimal solutions that balance the invisibility and robustness of watermarks.
摘要:ObjectiveTypical cloud data registration methods based on cloud data have high demands on the locations and overlapping degree of the two cloud datasets to be registered. For instance, the iterative closest point (ICP) algorithm establishes the corresponding relationship by searching the closest points, usually measured in Euclidean distance, between two point sets. Poor initial positions of two point cloud sets commonly lead to many erroneous correspondences, and eventually only local optimal solutions can be obtained rather than the global optimal solution. In addition, the ICP algorithm has high requirements on the overlapping degree between two clouds to be registered. To tackle this problem, we propose a ${\rm step}$-by-${\rm step}$ registration algorithm for the registration of scattered cloud data.MethodDifferent from most existing curvature-based registration algorithms, our proposed method consists of one sequential filtering process for erroneously matched point pairs and one parameter estimation process based on the Hough transform. In the filtering process, the point cloud curvature and other geometric derivative information were calculated, and the candidate matched point pairs from different clouds were extracted subsequently on the basis of the curvature. For the further elimination of the erroneously matching point pairs, these candidates were sequentially filtered on the basis of two similarity measures for the invariant signature and the persistence feature histogram. In the parameter estimation stage, the Hough transform was used to remove the contributions of the erroneously matched point pairs and to determine the final transform for registration upon parameterizing the rotation matrices and the translation vectors of these point pairs.ResultThe proposed algorithm is used to register the bunny point cloud data downloaded from the official website of Stanford University for validation. To do so, the entire point cloud data are initially divided into two overlapping point cloud datasets. The overlapping degree of these two cloud datasets is approximately 22%. One of the point cloud datasets is subjected to arbitrary translation and rotation operations. Our proposed algorithm is then employed for the registration of the aforementioned two partially overlapped cloud datasets. The initial matching point set is extracted on the basis of curvature similarity, and then the similarity measures of the invariant features are used as the first filter to eliminate erroneously matched point pairs. Experimental results corroborate that approximately twice improvement in the accuracy of the matching points can be achieved upon employing the filter based on the invariant feature. Subsequently, the similarity measures of the continuous feature histogram are used as the second filter for the previously obtained matching point pairs. The results of this procedure affirm that approximately twice improvement in the average accuracy is achieved. Finally, clustering analysis is performed on the basis of the Hough transform to obtain the final coordinate transformation parameters from the aforementioned matching point pairs and to further eliminate the contribution of the outliers of the matching points to the transformation parameters. Our proposed algorithm, which used the Hough transform, is compared with the classical Ransac algorithm. The number of random sampling is set to k=1000, and the distance from the point to the space line threshold is set to 0.08 to match the parameters of the sphere space. At the 1% noise level, the matching points using the Hough algorithm accuracy is 92.67%, and the average accuracy of the Ransac algorithm is 88.2%. At the 5% noise level, 50% matching accuracy can be achieved with the Hough transform-based algorithm, whereas that of the Ransac algorithm is 18.4%. Additionally, at each noise level, the results from each run of the Ransac algorithm cannot be repeated, implying that the stability of the Ransac algorithm is poor. Thus, our method outperforms the classical Ransac algorithm in accuracy, stability, and robustness to noise. Moreover, our method runs faster than that based on the Ransac algorithm.ConclusionOur proposed algorithm can be used for the registration of any partially overlapped clouds with any relative deviation and achieve high accuracy and robustness to noise. The proposed algorithm is time-consuming because it requires ${\rm step}$-by-${\rm step}$ implementations. Our future work will concentrate on investigating the point cloud data reduction strategy and its corresponding combination method to improve the efficiency of the algorithm further.
摘要:ObjectiveImage retargeting, which has become an increasingly in-demand tool with the proliferation of mobile devices, aims to adjust images into different sizes or aspect ratios for various display screens. Many retargeting methods have been proposed during the past few years, but a single method that works efficiently on any image still does not exist. Different images favor different retargeting algorithms, and a key problem is to estimate the performance of each retargeting operator. Image retargeting quality assessment (IRQA) is an effective way to improve the performance of image retargeting techniques and be utilized to select favorable retargeting approaches for real applications. Nevertheless, objective IRQA is always a challenging research problem. First, the resolution of a retargeted image is different from that of its original image; thus, the problems of IRQA become different from those of the traditional image quality assessment (IQA). For example, traditional full-reference IQA methods, such as structural similarity and feature similarity indices, measure pixel-to-pixel similarity to capture image quality scores, which cannot be directly applied to IRQA. Second, traditional IQA metrics mainly focus on estimating the perceptual similarities between a source image and its corresponding non-geometrically distorted version. With respect to the IRQA problem, the perceptual quality of a retargeted image is intensely related to human cognition of this image. The structure and semantic information of an object should be consistent with the prior knowledge of humans. We propose a new method for IRQA via bidirectional similarity transformation to accurately evaluate the quality of retargeted images.MethodGeometric distortion and information loss are two important issues in image retargeting. We propose a novel metric to quantify the geometric distortion and information loss of a retargeted image. Instead of only establishing a pseudo mapping relationship between the retargeted image and its original image, we regenerate the retargeted image from the original image and regenerate the original image from the retargeted image inversely. The issue of pixel matching is then converted into a field of similarity transformation. We use the scale-invariant feature transform (SIFT)-flow algorithm to extract a dense SIFT descriptor for each pixel in the original and retargeted images to build a reliable matching relation between the images. As a result, pixel-wise correspondences are established, then forward and backward similarity transformation matrices can be calculated from their corresponding mesh vertex coordinates. Similarity transformation matrix contains important information about the image retargeting process, which controls mesh deformation, and is a decisive factor for geometric distortion and information loss. Geometric distortion is calculated from the distance between a similarity transformation matrix and the benchmark transformation matrix in this study. Rotation and scaling parameters can reflect geometric distortion; hence, the distance defined in this work is composed of two components:the absolute distance difference and aspect change, which are never applied in previous methods. A large distance between the estimated and benchmark transformation matrices usually means large information loss. However, if a salient object is discarded or cropped, then the above geometric distortion measurement cannot correctly reflect such a type of information loss. Therefore, the pixels in the original or retargeted image are mapped to its opposite image with the forward or backward transformation, and the information loss is calculated from the missing areas. The quality of the retargeted image is obtained by the geometric distortion and information loss from the forward and backward transformations.ResultExperimental results on the publicly available RetargetMe and CUHK datasets demonstrate the superiority of the proposed method. In the RetargetMe dataset, the Kendall rank correlation coefficient of the method reaches 0.46, and it has a good evaluation output on each subclass, especially on foreground objects, texture, and geometric structure. In the CUHK database, the Spearman's rank-order correlation coefficient is above 0.71. The results of one-way matching, including forward and backward matching, are worse than those of bidirectional matching affected by matching error. In this method, grid size affects similarity transformation and hence also influences geometric distortion and information loss measurements. We test different grid sizes, and the results indicate that the choice of grid size has a certain effect on quality prediction. The grid size of 16×16 has a relatively high performance on the CUHK and RetargetMe databases.ConclusionWe present a novel IRQA method based on bidirectional similarity transformation. Unlike in traditional IRQA metrics that only estimate the matching similarity between original and retargeted images, we regenerate the retargeted image from the original image and the original image from the retargeted image to extract effective features from the similarity transformation matrix and the loss of mesh areas. The major contribution of this study is that it considers the retargeting operator as a process of image similarity transformation. The similarity transformation matrix connects original and retargeted images, and it exerts a great influence on the retargeted image quality. Features extracted from the similarity transformation matrix can measure geometric distortion accurately. Geometric change can lead to information loss if partial information is preserved or discarded. Thus, features extracted from the reduced mesh ${\rm area}$ can reflect the information loss of each retargeted image inerrably. The bidirectional matching mechanism that we employ can effectively reduce the influence of pixel matching error. Our quality assessment method therefore has a better correlation with subjective scores, outperforms existing methods, and is suitable for image retargeting quality prediction and optimizing retargeting algorithms.
关键词:image retargeting quality assessment;similarity transformation;bidirectional matching;geometric distortion;information loss
摘要:ObjectiveWith the development of 3D content acquisition and display technologies in recent years, three-dimensional (3D) video has received increasing attention. The multi-view video plus depth format is the main representation of a 3D scene. In the 3D extension of high-efficiency video coding (3D-HEVC), the main framework for depth video is similar to that of HEVC. Each coding unit (CU) is recursively divided into four sub-CUs. Each CU depth level enables 37 types of intra modes in intra frames. Unlike conventional texture videos, depth videos are not used for watching, but for virtual view rendering. The preservation of depth sharp object edges is important for depth video compression. Several new techniques, such as depth modeling mode (DMM) and view synthesis optimization, are introduced into the current 3D-HEVC test model to improve the efficiency of depth video intra coding and the quality of synthesized views. These techniques improve the coding efficiency of depth videos. However, they greatly increase the computational complexity of depth intra coding, thereby hindering the real-time applications of 3D-HEVC. Depth videos are also inaccurate and inconsistent because of the limitations of mainstream capture technologies. The inaccuracy of depth videos further increases the computational complexity of intra coding. Previous research on low-complexity depth video intra coding and depth video enhancement has been conducted separately. Thus, a joint depth video enhancement and fast intra-coding algorithm is proposed in this study.MethodAn enhancement method is applied before encoding to remove inaccurate textures in a depth video and enhance the spatial correlation of the depth video. The edge region is preserved for rendering performance. For non-edge regions, Gaussian and adaptive window smoothing filters are used. The enhanced depth video is mainly characterized by sharp object edges and large areas of nearly constant regions. We can skip some prediction modes and CU depth levels rarely used in homogeneity regions by fully exploiting such features. CUs are classified according to texture complexity, and the partition process of CUs with low texture complexity is terminated early. Prediction units (PUs) are classified according to edge intensity. The proposed algorithm selectively omits unnecessary DMM in the mode decision process on the basis of the${\rm PU}$ classification results. The algorithm is implemented on the reference software, HTM-16.0, of the 3D-HEVC standard and tested under common test conditions required by the Joint Collaborative Team on 3D Video Coding to evaluate its performance. The proposed algorithm is specially designed for depth videos that are estimated using stereo matching; thus, sequences synthesized by a computer are not tested. The proposed scheme aims at depth video intra coding, and all test sequences are coded with intra-only structure and three-view configuration. The rate distortion performance of the proposed algorithm is evaluated by using the Bjontegaard delta bitrate, which is calculated by the peak signal-to-noise ratio of the synthesis view quality and the total bitrate, including color and depth videos.ResultExperimental results show that the proposed algorithm significantly saves the encoding time of the depth video and reduces bitrate under the same synthesized virtual view quality. The coding time reduction obtained by the proposed algorithm compared with that of the original 3D-HEVC encoder ranges from 61.35% to 65.73% and is 62.91% on average. In terms of coding efficiency, our proposed algorithm can reduce bitrate by 4.63% under the condition of the same synthesized virtual view quality, in which the maximal and minimal reductions are 8.10% and 2.60%, respectively. The subjective quality of the proposed algorithm is significantly improved compared with that of the original 3D-HEVC encoder. The significant performance improvement of depth video coding contributes to the depth video enhancement and the fast algorithm. The proposed algorithm is superior to the state-of-the-art fast depth intra-coding algorithm. The encoding time saving of depth video is greatly increased by 26.10%, and the bitrate is further reduced by 5.20% under the condition of the same synthesized virtual view quality.ConclusionA joint depth video enhancement and fast intra-coding algorithm is proposed to solve the problems in 3D-HEVC, the high computational complexity of depth video intra coding, and the inaccuracy of depth videos. The proposed enhancement method improves the spatial correlation of depth videos. The fast intra-coding scheme can significantly reduce the encoding time of depth videos. Therefore, the proposed method not only reduces encoding time but also improves the compression performance of depth video intra coding.
关键词:3D extension of high efficiency video coding(3D-HEVC);depth video enhancement;fast intra coding;texture complexity;edge intensity
摘要:ObjectiveFeature extraction plays an important role in image recognition, including face recognition and character recognition. However, feature extraction usually depends on domain knowledge or prior experiences. Convolutional neural networks (CNNs) are attracting the attention of researchers because they can automatically extract efficient features and simultaneously achieve image recognition via their self-learning capability. However, the learned features via classical CNNs may often have a poor discriminant capability for image recognition. This study introduces a linear discriminant analysis loss (LDloss) into CNNs and develops a discriminative deep feature learning method by fusing linear discriminant analysis (LDA) for image recognition. Therefore, CNNs can provide discriminant features to improve image recognition performance.MethodA deep CNN for feature extraction is constructed to automatically extract efficient features for image classification tasks via a multilayer perceptron. LDA is introduced, and a new linear discriminant loss function (LDloss) is developed via a variant form of Fisher criterion. The new LDloss and softmax loss are integrated into a united loss function for deep feature learning. The learned features can minimize classification error and simultaneously achieve inter-class dispension and intra-class compactness. An average strategy based on mini-batch is used to update class centers in the learning process.ResultExperimental results on MNIST and CK+ databases show that the proposed algorithm achieves 99.53% and 94.73% average recognition rates, respectively. Unlike softmax and hinge losses, LDloss achieves 0.2% and 0.3% increases on the MNIST database and 9.21% and 24.28% increases on the CK+ database under the same network structure, respectively. The proposed method can achieve 100% recognition rate for some classification in the MNIST and CK+ databases.ConclusionThis study proposes a new discriminant deep feature learning algorithm for image recognition. The experimental results on different databases show that this proposed method can explicitly achieve intra-class compactness and inter-class separability among learned features and further efficiently improve the discriminative capability of the learned features. Therefore, the proposed method can achieve a higher accuracy recognition rate than some existing methods can. The proposed method also does not need additional computation load compared with softmax loss during the testing stage.
摘要:ObjectiveHuman action recognition based on 3D skeleton has been a popular topic in computer vision, the goal of which is to automatically segment, capture, and recognize human action. Human action recognition has been widely applied in real-world applications. For the past several decades, it has been used in surveillance, video games, robotics, human-human interaction, human-computer interaction, and health care, and has been widely explored by researchers since the 1960s. This study obtains 3D data in four ways. First, a motion capture system is used based on a marker. Second, multiple views are used for 2D image sequence reconstruction of 3D information. Third, range sensors are used. Fourth, RGB videos are used. However, extracting data by using a motion capture system and reconstruction is inconvenient. Range sensors are expensive and difficult to use in a human environment, and they obtain data slowly and provide a poorly estimated distance. Moreover, RGB images usually provide the appearance information of the objects in the scene. Given the limited information provided by RGB images, solving certain problems, such as the partition of the foreground and background with similar colors and textures, is difficult, if not impossible. Moreover, RGB data are highly sensitive to various factors, such as illumination, viewpoint, occlusions, clutter, or diversity of datasets. RGB video sensor data cannot capture the information that human needs. The rapid development of depth sensors, such as 3D Microsoft Kinect sensor, in recent years has provided not only color image data but also 3D depth image information. Three-dimensional depth images record the distance between object and body, thereby producing considerable information. Real-time skeletal-tracking technique and support vector machine recognize various postures and extract key information. The investigation of computer vision algorithms based on 3D skeleton algorithms has thus attracted significant attention in the last few years. Many researchers have been studying skeleton-based algorithms, which have presented numerous achievements and contributions. The present action recognition algorithm selects a fixed joint as the coordinate center, which leads to a low recognition rate. An adaptive skeleton center algorithm for human action recognition is proposed to solve the problem of low accuracy. Method In the algorithm, frames of skeleton action sequences are loaded onto a human action dataset, redundant frames are removed from the sequence frame information, and the original coordinate matrix is obtained by preprocessing the sequences. Rigid vector and joint angle features are generated by extracting the original coordinate matrix. The adaptive value can be determined on the basis of changes in rigid vector and joint angle values. The coordinate center can be adaptively selected according to the adaptive value and used to renormalize the original matrix. The action coordinate matrix is denoised by using a dynamic time-planning method. The Fourier time pyramid method is used to reduce the time displacement and noise problems of the action coordinate matrix. The matrix is classified by using support vector machine. Result Unlike existing algorithms, such as histogram of 3D joint (HO3DJ), conditional random field (CRF), EigenJoints, profile hidden Markov model (HMM), relation matrix of 3D rigid bodies+principal geodesic distance, and actionlet algorithms, the proposed algorithm exhibits improved performances on different datasets. On the UTKinect dataset, the action recognition rate of the proposed algorithm is 4.28% higher than that of the HO3DJ algorithm and 3.48% higher than that of the CRF algorithm. On the MSRAction3D dataset, the action recognition rate of the proposed algorithm is 9.57% higher than that of the HO3DJ algorithm, 2.07% higher than that of the profile HMM algorithm, and 6.17% higher than that of the EigenJoints algorithm. Action Set (AS)1, AS2, and AS3 are subsets of the MSRAction3D dataset. The action recognition rate of the proposed algorithm is not as good as that of the other algorithms on the AS2 dataset, but the action recognition rates of the proposed algorithm are high on the AS1 and AS3 datasets.ConclusionThe proposed algorithm solves the low accuracy problem of the existing action recognition algorithm. The coordinate center of a fixed joint is adopted. Simulation results show that the proposed algorithm can effectively improve the accuracy of human action recognition, and its action recognition rate is higher than those of existing algorithms. On the UTKinect dataset, the recognition rate of the proposed algorithm is at least 3% higher than those of other algorithms, and the generated single-action recognition rate is as high as 90%. On the MSRAction3D dataset, the proposed algorithm shows advantages on AS1 and AS2 datasets, but its recognition rate on AS2 is not ideal, particularly in the recognition of the upper limb. Therefore, this algorithm needs improvement. The algorithm is generally efficient for single-action recognition. The next research direction is complex action recognition.
摘要:ObjectiveThe application of unmanned surface vehicles in inland rivers has broad prospects, such as water quality monitoring and hydrographic surveying and mapping. However, the existing visual research on unmanned surface vehicles is mostly based on sea environment. When unmanned surface vehicles navigate autonomously, the shoreline of an inland river is equivalent to the skyline detected in the sea environment, which has great significance for visual navigation of unmanned surface vehicles. Shoreline can be used for image partition, finding a water surface area, obstacle avoidance navigation, and estimating the motion state of unmanned surface vehicles. Shoreline is an important reference for the autonomous navigation of unmanned surface vehicles. Although a shoreline is similar to a skyline, the background of a shoreline is more complex than that of a skyline due to the influence of water waves, reflected light, and inverted image. The existing skyline detection method is unsuitable for shoreline detection. Inspired by the color perception mechanism of the human visual system, we propose to detect shoreline based on hue, saturation, and value (HSV). MethodWe collect water surface images by using the camera of an unmanned surface vehicle and analyze the image features in the HSV color space. The saturation of a land area is higher than that of sky and water areas. When the light situation is dark, the hue information cannot be used. The land area in an image is darker than other areas, but the image can still be used for shoreline detection. On the basis of this analysis result, we propose a shoreline detection method by combining HSV spatial water image features. First, we transform an RGB image into an HSV color space after Gaussian filtering. The Gaussian model can effectively overcome the interference caused by a change in illumination and the disturbance of a water image. Components in the HSV color space are selected by the weight of the land area features, and the selected components are enhanced nonlinearly to improve their contrast. Second, we segment the enhanced image and define each region as a bottom image. Third, we analyze the features of rows and columns in the original saturation image and extract land areas with high saturation as template images. We cover each bottom image with the template image and select the bottom image by the overlap area ratio. After overlapping the selected bottom image, we obtain the final land area image. Finally, we obtain the absolute shoreline by using an edge detection operator and removing the overlapping line of the sky and land in the topside. Result Water surface images with different light intensities in different seasons are collected. When we select images during middays of spring and autumn, the detected shorelines are clear and complete. When we select images during afternoon, the backgrounds of the surface images are complex due to poor lighting, but we still can accurately detect shorelines. When we select reflection images of water surfaces to perform a comparative experiment, the proposed method can effectively remove the reflection interference of the water surfaces. When we select images of water surfaces during sunset, during which the light tends to be red, the proposed method is unaffected by the red light. When we select images of water surfaces with sun reflection, the proposed method can remove the sun reflection in the images. Therefore, we can conclude that the proposed method has strong anti-interference. The experimental results reveal that the method can accurately detect shorelines in different light environments and ensure that the contours of the shorelines are clear and complete. The real-time result of the method can reach 1 frame/s.ConclusionThe proposed method can effectively detect shorelines in different light environments and ensure that their contours are clear and complete. The method can be applied for visual navigation of unmanned surface vehicles.
关键词:unmanned surface vehicle;visual navigation;shoreline detection;HSV color space
摘要:ObjectiveUnit quaternion represents rotation more compactly and intuitively than rotation matrix and is superior to Euler angle parameterization in avoiding gimbal lock. This study constructs a class of ${C^3}$-continuous unit quaternion interpolation spline curves. The curves in unit quaternion space are an extension of a class of quintic polynomial interpolation spline curves in Euclidean space. They maintain high continuity and interpolating properties and are suitable for controlling the orientation of a rigid body in keyframe animation.MethodRigid keyframe animation usually has two or more keyframes (nodes) to be connected, and the rigid body needs to pass through the keyframes accurately. When the second derivatives of the orientation curve are discontinuous at some nodes in the spline curve, the angular acceleration of the object will suffer from a large jump, thereby resulting in undesirable rotation effects that are inconsistent with the expectation. Therefore, how to construct a high-order continuous unit quaternion curve is important in computer animation. This study first gives the definition and properties of quaternions and then selects proper quintic polynomial-blending functions to achieve a smooth interpolating orientation curve. Interpolation spline curves are constructed in ${\mathit{\boldsymbol{R}}^3}$ space by taking such quintic polynomial-blending functions as the base functions. The constructed curves can pass through the given data points and are ${C^3}$-continuous when the knot vector is evenly distributed. We extend the spline curve in ${\mathit{\boldsymbol{R}}^3}$ space into ${S^3}$ space. Therefore, the unit quaternion interpolation spline curve maintains many similar important properties to those of the quintic polynomial interpolation spline curve in Euclidean space, such as interpolation and high-order continuity. Inspired by Kim et al.[5], we take the cumulative forms of the constructed blending functions of the interpolation spline curves in Euclidean space as the base functions to construct unit quaternion interpolation spline curves in ${S^3}$ space. Specifically, the unit quaternion interpolation spline curve is the product of several exponential functions, in which the exponents are cumulative forms of the blending functions, and the bases are constant unit quaternions that are the given keyframe orientations and the angular velocities between every two adjacent given keyframe orientations. Using this ability, the produced unit quaternion curves behave similarly to the spline curves in Euclidean space. For example, they not only pass through a given sequence of orientations (denoted by unit quaternions) exactly without the iterative process in solving the quaternionic nonlinear system of equations, as used in the classical B-spline unit quaternion interpolatory curves to obtain the control points from given data points but also can possess ${C^3}$ continuity. A strict proof of these properties is given. An application example is provided to show that the constructed unit quaternion curve can play a role in the keyframe animation design.ResultWe provide a 3D keyframe interpolation animation instance and comparison experiments to illustrate the feasibility and effectiveness of the proposed unit quaternion orientation curve construction method. From the application example, we observe that the proposed curve automatically passes through a given sequence of keyframe orientation accurately. The postures of cherries are controlled by the unit quaternion curve. Red cherries are randomly given keyframes. The orientations of intermediate green cherries are determined by three different types of unit quaternion spline curves. The position curve of the cherries adopts the same cubic uniform B-spline curves. Unlike Nielson's iterative construction scheme of unit quaternion uniform B-spline interpolation curves, the proposed scheme avoids solving the nonlinear system of equations over quaternions to obtain the control points, which greatly improves computational efficiency. Unlike Su et al.'s algebraic trigonometric blending interpolation quaternion spline curves, our scheme only uses polynomial bases, which run faster than trigonometric functions. The time required for the animation example by using our scheme decreases by approximately 73% and 33% compared with those by Nielson's and Su et al.'s method, respectively. Moreover, the continuity of the curve produced by the proposed scheme is ${C^3}$, higher than the ${C^2}$-continuous curves constructed by the other two methods, which means that the changes in the orientation of the rigid body in the animation will be more natural.ConclusionSimulation results demonstrate that the proposed method is effective for rigid-body keyframe animation design and particularly applicable for design occasions with high real-time and fluency demands. Although the proposed method has the advantages of automatically accurate interpolation, high continuity, and fast running, this paper does not discuss the fairness of the curve. From numerous experimental results, we deduce that the curve fairness produced by the proposed method is not as good as the fairness of unit quaternion uniform B-spline curves. The next step of our work will be to study the construction method of a quaternion interpolating spline curve from the aspect of energy optimization, such as minimizing torque energy or curvature.
摘要:ObjectiveNumerous clinical studies have shown that changes in the volume or morphology of the hippocampus and its subfields are closely related to many neuro degenerative diseases, such as Alzheimer's disease and mild cognitive impairment. Accurate segmentation of hippocampus and its subfields from the brain magnetic resonance imaging, which is a prerequisite for volume measurement, plays a significant role in the clinical diagnosis and treatment of many neurodegenerative diseases. Despite significant progress in recent decades, hippocampal subfield segmentation remains a challenging task mainly because the volume of hippocampal subfields is too small and the boundaries of subfields are insufficiently distinctive to extract. Traditional hippocampal segmentation, which involves methods based on multi-atlas, sparse representation, and shallow network, cannot achieve satisfactory segmentation. By contrast, some methods that use image features extracted by convolutional neural networks (CNNs) have demonstrated state-of-the-art results on a wide range of image segmentation tasks. In this study, a hippocampal subfield segmentation approach based on CNN and support vector machine (SVM) is proposed. These two models are combined to exploit their individual advantages to further improve the accuracy of hippocampal subfield segmentation.MethodThe proposed approach constructs a new model that combines CNN and SVM.Magnetic resonance image patches centered at the target pixel point are fed to the network as input images. After a series of convolution and downsampling operations, output image features of a fully connected layer are used as the inputs of SVM. SVM is trained with the features to implement the pixel classification of images. The CNN consists of an input layer, three convolution layers, three downsampling layers, a fully connected layer, and an output layer, where the downsampling layer can provide numerous abstract features for image segmentation tasks. Data augmentation is employed to expand labeled data by adding Gaussian noise and rotation operations to prevent overfitting. The new model overcomes the shortcomings of CNN and SVM by combining their advantages. The learning principle of the CNN classifier, which is basically the same as that of multilayer perceptron, is easy to fall into a local minimum due to the training network to minimize classification errors by minimizing experience risks. SVM is based on the principle of structural risk minimization to minimize the generalization error of training set data. By solving the quadratic programming problem, SVM obtains a hyperplane that is the global optimal solution, thereby effectively avoiding the local optimum. Therefore, the generalization capability of the new model is significantly improved.ResultIn the experiment, the proposed approach is tested on the brain magnetic resonance images of 32 volunteers from the Center for Imaging of Neurodegenerative Diseases in San Francisco, California, USA. The approach is qualitatively and quantitatively compared with methods based on SVM, CNN, and sparse representation and dictionary learning in the first part of the experiment. The segmentation Dice similarity coefficients (DSCs) of the proposed approach for Cornu Ammonis (CA)1, CA2, dentate gyrus, CA3, head, tail, subiculum, entorhinal cortex, and parahippocampal gyrus in the hippocampal subfields are 0.969, 0.733, 0.977, 0.987, 0.981, 0.982, 0.972, 0.986, and 0.976, respectively. The comparisons demonstrate that the proposed method, which achieves significantly improved accuracy of all the hippocampal subfields, outperforms the existing methods based on dictionary learning and sparse representation and multi-atlas. For the large subfields, such as head of hippocampus, the DSC is increased by 10.2% compared with those of the state-of-the-art approaches. For the small subfields, such as CA2 and CA3, the segmentation accuracies are also significantly increased by 36.2% and 52.7%, respectively. The effects of image patch size, number of convolution layer, and number of convolution layer features on the segmentation results are tested with a control variable method in the second part of the experiment.ConclusionIn this study, CNN is introduced to extract image features automatically, and SVM, instead of CNN classifier, is used to classify image pixels. The proposed method, which can improve the generalization capability of the classifier, overcomes the shortcomings of most other traditional classifiers that largely rely on the retrieval of good hand-designed features, which is a laborious and time-consuming task. Experimental results prove that the proposed method can effectively improve the segmentation accuracy of hippocampal subfields in brain magnetic resonance images, which provide the basis for the clinical diagnosis and treatment of many neurodegenerative diseases.Future work includes reducing the computation time of the algorithm, improving the segmentation accuracy of small hippocampal subfields by optimizing the algorithm, and extending the proposed network to the segmentation of other organs by fine-tuning the network parameters.
摘要:ObjectiveDiabetic retinopathy (DR) is one of the complications of diabetes and causes severe vision loss and blindness in severe cases if left untreated. A regular eye examination is important for initial diagnosis and early treatment. The change in the blood vessels of the retina is the leading cause of DR. The form of red lesions, such as hemorrhage/microaneurysm (HMA), is the first explicit sign and an important symptom of DR. Hence, in the traditional DR diagnosis system, the accuracy of HMA lesion detection determines the final diagnosis performance. The diagnosis method produces a large number of false positive samples for high sensitivity, and the supervised classification model is ineffective in removing false positives because the dataset does not label the lesion area. A new algorithm based on multi-kernel and multi-instance learning is proposed to solve the problem of supervised learning in DR diagnosis.MethodFirst, a multi-scale morphological top-hat transformation is employed to enhance blood vessels and red lesions on the green channel image, then the main vessels of the retina are segmented by thresholding technique on the mask image obtained by binarizing the enhanced image. All regions of interest are generated by subtracting the main vessels from the mask image, and a connected-component labeling technique based on region growing is conducted to detect the suspicious HMA. The detected HMA areas are considered instances, and the entire image is considered a bag. Thus, the problem of DR diagnosis is considered a multi-instance learning problem. Second, a 37D feature based on color, texture, and shape is extracted for each candidate HMA to describe the instance in multi-instance learning. Numerous suspected HMAs are generally obtained to ensure high sensitivity in the initial detection of the lesions, but many HMAs would produce a negative effect on the performance of the multi-instance learning. An extreme learning machine (ELM)-based classifier is accordingly constructed to filter irrelevant instances for improving the multi-instance learning performance. Nevertheless, no such database that contains both the ground truth of DR diagnosis at an image label and HMA segmentation at a lesion label is publicly available. For example, the MESSIDOR dataset contains diagnosis information for DR but not the ground-truth location of HMAs, whereas the E-ophtha dataset contains the location information of HMAs without the diagnosis label. Consequently, the ELM-based classifier trained on the E-ophtha dataset cannot be applied directly to the MESSIDOR dataset due to the difference between datasets. A threshold on the output probability value of the ELM-based classifier is designed to filter the irrelevant instances, and the best threshold can be obtained by cross validation on the training set. Finally, a multi-instance learning method, mi-Graph, which assumes that the instances in a bag are not independently and identically distributed, combined with a multi-kernel learning framework, is adopted for DR diagnosis. The method implicitly constructs graphs by deriving affinity matrices and defines an efficient graph kernel considering clique information. The kernel in the multi-kernel learning is defined as a linear combination of multiple kernels, including Gaussian, polynomial, and linear kernels. As a result, a multi-instance learning model based on multi-kernel graph is constructed to classify the input retinal image into DR or no-DR status.ResultThe evaluation is implemented on 1 200 images from the publicly available MESSIDOR dataset, which provides the DR diagnosis results. We verify the effectiveness of the proposed method and the irrelevant instance filtration method. The contributions of different features to DR diagnosis in a multi-kernel learning framework are analyzed. We compare our method with other multi-instance learning methods, such as iterative axis parallel rectangle, expectation-maximization diverse density, citation-k-nearest neighbor, and multiple-instance support vector machine. Our method and other DR diagnosis methods on the MESSIDOR dataset are also compared. Our proposed method achieved an accuracy of 90.1%, sensitivity of 92.4%, specificity of 91.4%, and area under the receiver operating characteristic curve of 0.932. Results show that the proposed method performs better than the other methods and is comparable to previous methods.ConclusionA multi-instance learning algorithm is introduced into DR diagnosis in this study. The detected HMAs and the entire image are considered instances and a bag of multi-instance learning, respectively. The relationship among the instances in a bag is established by using a kernel graph. A multi-kernel learning framework is adopted to enhance the generalization classification performance. Consequently, a multi-instance learning model based on multi-kernel graph is constructed for DR diagnosis. The experimental results indicate that the proposed approach can be used to diagnose DR efficiently without label information of suspicious lesions to avoid the time-consuming effort of labeling the lesions by specialists and false positive reduction.
摘要:ObjectiveThe development of 3D reconstruction technology has led to the wide application of 3D scanning technology in the cultural relic reconstruction and reverse engineering industries. However, scanning models usually include many holes after reconstruction due to the scanning environment, scanning techniques, and other factors. Models with holes affect follow-up application in the cultural relic reconstruction industry and industrial reverse industry. Radial basis function has been widely used in repairing holes. However, implicit surfaces based on radial basis function are sensitive to noise data. Therefore, they cannot be efficiently used in noise hole filling. This study proposes an improved hole-filling algorithm based on radial basis function. The algorithm mainly consists of smoothing noise data, interpolation subdivision hole structure, and radial basis function. In this way, the proposed algorithm can resolve the boundary mutation problem for noise hole repair.MethodFirst, the algorithm uses the Laplace function to smooth the boundary of noisy holes. We use the topological structure of adjacent triangles on the model to check the boundary of holes and identify holes. After identifying holes, we smooth their boundary by multiple neighborhoods before building an implicit surface. The algorithm uses the smoothed data to solve the implicit surface on the basis of the radial basis function. Second, the algorithm subdivides the holes by applying the fast center of gravity interpolation method for regular holes. The gravities of boundary and convergence are used to perform a subdivision process. Third, the algorithm combines the curvature features around the holes to make the filling holes consistent features with scanning data. Therefore, boundary and normal constraint points are used to define the implicit surface solution. The boundary constraint points are taken from the multiple neighborhoods from the boundary of holes, and the normal constraint points are taken with the constraint of normal direction. Finally, the algorithm adjusts the interpolation point with the gradient descent method. The points of interpolation subdivision holes are usually not on the scanning surface of the holes. The algorithm uses the obtained implicit surface equation to find the partial derivative to adjust the interpolation points to the scanning surface rapidly. We use the gradient descent method and the setting error threshold to adjust the interpolation points of holes. In this way, we can achieve smooth patch of filling results.Result3D classical models and actual mechanical workpieces are scanned to verify our algorithm. The comparison experiment is conducted by applying the wave-front method and Geomagic software method to the same holes. Our algorithm uses a preprocessing method for smoothing holes and keeps the curvature features around the holes. An implicit surface based on the radial basis function is applied to ensure the smoothness of repairing holes. The repairing holes become highly consistent with the surrounding curvature. The experimental results show that the proposed algorithm based on radial basis function is adaptable to noise holes. The results of this algorithm are consistent with the curvature variation around the holes. Therefore, the filling results are natural and smooth. The algorithm is a good way to improve the existing problems of edge mutation and trace obvious.ConclusionThe algorithm solves the noise sensitivity problem of implicit surfaces based on radial basis function by the curvature features around holes. However, the results of the algorithm are not ideal for filling some complicated holes. The fast barycentric interpolation function also has limitations for irregular holes.
摘要:ObjectiveSingle-image super-resolution (SR) is a classical problem in computer vision. In visual information processing, high-resolution images are still desired for considerable useful information, such as medical, remote sensing imaging, video surveillance, and entertainment. However, we can obtain low-resolution images of specific objects in some scenes only, such as long-distance shooting, due to the limitation of physical devices. SR has attracted considerable attention from computer vision communities in the past decades. We address the problem of generating a high-resolution image given a low-resolution image, which is commonly referred to as single-image SR. Early methods include bicubic interpolation, Lanczos resampling, statistical priors, neighbor embedding, and sparse coding. In recent years, a series of convolutional neural network (CNN) models has been proposed for single-image SR. Deep learning attempts to learn layered, hierarchical representations of high-dimensional data. However, the classical CNN for SR is a single-input model that limits its performance. These CNNs require deep networks, considerable training consumption, and a large number of sample images to obtain images with good details. These requirements lead to the use of numerous parameters to train the networks, the increased number of iterations for training, and the need for large hardware. In view of these existing problems, an improved super-resolution reconstruction network model is proposed.MethodUnlike the traditional single-input model, we adopt a mutual-detail convolution model with double input. The combination of paths of different scales enables the model to synthesize a wide range of receptive fields. The different features of image blocks with different sizes are complemented at different scales. Low-dimensional and high-dimensional features are combined to supplement the details of the restoration images to improve the quality and detail of reconstructed images. Traditional self-similarity-based methods can also be combined with neural networks. The entire convolution model can be divided into three parts:F1, F2, and F3 networks. F1 is the feature extraction and nonlinearly mapping network with four layers. Filters with spatial sizes of 9×9, and 3×3 are used. F2 is the detail network used to complement the features of F1. F2 consists of two layers and filters with spatial sizes of 11×11 and 5×5. F3 is the reconstruction network. We use mean squared error as the loss function. The loss is minimized using stochastic gradient descent (SGD) with the standard backpropagation. The network takes an original low-resolution image and an interpolated low-resolution image (to the desired size) as inputs and predicts the image details. Our method adds a new input to supplement the high-frequency information that is lost during the reconstruction process. As shown in the literature, deep learning generally benefits from big-data training. We use a training dataset of 500 images from BSD500, and the flipped and rotated versions of the training images are considered. We rotate the original images by 90° and 270°. The training images are split into 33×33 and 39×39, with a stride of 14, by considering training time and storage complexities. We set a mini batch size of SGD to 64 and the momentum parameter to 0.9.ResultWe use Set5 and Set14 as the validation sets. From previous experiments, we follow the conventional approach to super-resolving color images. We transform the color images into the YCbCr space. The SR algorithms are applied only on the Y channel, whereas the Cb and Cr channels are upscaled by bicubic interpolation. We show the quantitative and qualitative results of our method in comparison with those of state-of-the-art methods. Unlike traditional methods and SRCNN, our method can obtain better peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) values of the experimental results shown in the Set5 and Set14 datasets. For the upscaling factor 3, the average gains on PSNR achieved by our method are 0.17 and 0.08 dB higher than those of the next best approach, SRCNN, on the two datasets. A similar trend is observed when we use SSIM as the performance metric. Unlike the training times of SRCNN, the iterations of our approach are decreased by two orders of magnitude. With a lightweight structure, our method achieves superior performance to that of state-of-the-art methods.ConclusionThe experiments show that the proposed method can effectively reconstruct images with considerable details with minimal training and relatively shallow networks. However, unlike the result of a very deep neural network, the result of our method is not sufficiently precise, and the network structure is relatively simple. We will consider using deep layers to acquire numerous image features at different layers and extending our model to several image tasks in the next work.
摘要:ObjectiveSaliency detection of images aims to predict areas that are most important and contain abundant information in an image. It has been applied to image segmentation, image compression, and object recognition, etc.. Numerous algorithms can be used to compute the saliency of an image, but the resulting saliency map may be incomplete and limited for methods that use color information of images only. A new saliency detection method that combines the color contrast with the color space distribution of images is proposed.MethodSimple linear iterative clustering (SLIC) super-pixel segmentation algorithm and K-means clustering algorithm are used to obtain relevant features to reduce time complexity. We utilize a smoothing filter that can remove the noise of images to acquire a smooth image. We divide our algorithm into three steps. First, the image is preprocessed by using the SLIC algorithm, and a color contrast map is obtained according to the color difference among pixel blocks. Pixel is a basic unit of an image. The difference between a pixel and its surroundings in color can partly determine if the pixel is contained in the salient area. According to this rule, we use pixel block instead of pixel to compute the color difference among pixel blocks and combine the distance among pixel blocks to obtain a color contrast map. Second, K-means clustering of the image according to color feature is performed. The filtered image is classified into M classes according to the features in LAB space, and the initial color space distribution of each class is computed on the basis of the compactness of spatial distribution and the uniformity of color distribution. The spatial color distribution of each class is mapped to super-pixel blocks, and the color space distribution map is further optimized to avoid the lack of spatial information in the clustering results. Third, the color contrast map is combined with the feature map of the image color spatial distribution to obtain the final saliency map.ResultOur method is compared with several popular detection algorithms on a public image test database called MSRA-1000. Experimental results demonstrate that the saliency map obtained in this study is accurate and complete, which illustrates the effectiveness of the proposed method. The resulting precision-recall and receiver operating characteristic curves show that the proposed method has high accuracy under the same recall rate or the same false positive rate.ConclusionThe proposed algorithm combines SLIC super-pixel segmentation and K-means clustering, which optimizes the advantages of the two methods and reduces time complexity. However, the proposed method still has some shortcomings. For example, the color space distribution of images with a complex color distribution may not be good enough, which will result in an incomplete final saliency map. Improvement of the proposed algorithm will concentrate on general applications.
关键词:saliency detection;simple linear iterative clustering (SLIC);super-pixel segmentation;K-means clustering;image color contrast;space distribution
摘要:ObjectiveThe parameter of a conventional C-Bézier curve is often limited in a closed interval. In this study, we focus on an integral one made up of the traditional inner segment in a finite closed interval (such as[0, $α$ ]) and a part out of the interval. However, in computer-aided design, the modeling of an integral curve is often expressed as different stages and results in redundant data. In fact, when modeling an entire curve, the control points of different segments may have relations with one another. Therefore, if the control points of one segment and some shape parameters are stored, then the entire curve may be obtained, and the curve data may be decreased. We need to determine the relations among different segments and judge whether they are on the same integral curve to decrease the final redundancy. We raise two questions:1) Given an inner curve, can any segment of its integral curve be presented as an inner form? and 2) Are two neighboring inner C-Bézier curves on the same integral curve? We select a C-Bézier curve for our research. The focus of this study is to consider the changes in control vertices for the C-Bézier curve when the original parametric region[0, $α$ ] is scaled on (-∞, +∞).MethodAny C-Bézier curve is divided into two arcs from geometric point of view:a center Bézier curve and a trigonometric part. On the basis of their movements, any segment of an integral C-Bézier curve can be represented as an inner form. We can analyze relations of control vertices from algebra perspective and give three forms (direct, subdivision, and linear interpolation forms) between newly produced control points in the movement and old ones. First, we represent certain segments of the integral C-Bézier curve as an inner form, consider basis functions recursively, and compare them to obtain the direct form of original control points. Second, one endpoint of the moving segment is considered, which relates to a subdivision scheme. The scheme subdivides the inner curve into two neighboring C-Bézier segments. Similar to the direct form, expressions can be easily worked out by using recursive evaluation. Third, we consider a corner-cutting form under special subdivision situation to identify linear interpolation from easy to difficult. The corner cutting is an alternative of the direct form, and the corner-cutting form can be obtained by the knot-inserting process. However, the NUAT-B-spline generated after interpolation cannot adapt to the integral case because it is piecewise and is zero out of the interval. We use the area between a corner-cutting scheme and t-axis to extend the scheme. Subdivision with a corner-cutting form is obtained on the basis of recursion and the relations between a basis and the subdivision scheme. The linear interpolation form is considered to move along an integral C-Bézier curve for general case. The length of the parameter interval of an inner C-Bézier curve needs to be less than π; thus, the corner-cutting form under special subdivision situation cannot be directly extended. Although the parameter interval of the C-Bézier curve changes, the position of each point on the integral curve never changes. We utilize an evaluation scheme to solve the extension problem. Consistent with Bézier curve, results of the movement along an integral C-Bézier curve with a linear interpolation form are obtained by using recursive evaluation twice. Finally, we establish an algorithm to judge if two given inner C-Bézier curves are on the same integral curve by considering that integral curve can be used to reduce redundant data. The error of the C-Bézier curve can be limited by the error of its control points; hence, we use an error item to control the judgment accuracy after calculating control points by direct form.ResultThis study focuses on C-Bézier curves and regards the traditional inner part and the extended part out of an interval as integrals. An inner C-Bézier curve can be moved along the integral curve while its parameter integral length is less than ${\rm{ \mathsf{ π} }}$ , and motion curves can be represented as an inner C-Bézier form when its parameter interval length is in (0, ${\rm{ \mathsf{ π} }}$ ). New control points can be obtained by a direct linear combination or stepwise linear interpolation (including traditional interpolation and extrapolation) form of the old ones. A subdivision scheme, including direct and corner-cutting forms, of the inner C-Bézier curve is included as a subcase. The integral curve and the movement along it may be considered to reduce redundant storage data.ConclusionThe applications are as follows:First, the movement along an integral C-Bézier can be used to scale the parameter interval of a given C-Bézier curve. Second, integral curve can be considered to reduce redundancy by focusing on the part and extending the parameter interval. Third, two neighboring C-Bézier curves are judged on whether they are on the same integral curve under permissible error. If they are on the same curve, then data of one curve may be reduced while storing, whereas data of the other one can be saved. However, the movement process is slow because of the recursive integral definition of the C-Bézier basis. In the future, we may consider the acceleration of the movement method or other types of Bézier-like curves.
摘要:ObjectiveHazy or foggy images have low visibility, contrast, and saturation. These features are undesired in computer vision applications, such as most automatic systems for surveillance, intelligent vehicles, and outdoor object recognition. Thus, image haze removal plays a significant role in computer vision. Many haze removal methods, including image enhancement and restoration methods, have been proposed. Image enhancement methods often suffer from information loss, over-enhancement, distorted colors, and significant halos. Image restoration methods are based on hazy image degradation models and therefore perform better than enhancement methods. On the basis of image degradation models, the use of dark channels prior to obtaining haze-free images is accepted, but the restored images often suffer from the block effect. Some optimization methods are proposed to eliminate the block effect. Visual effects are improved, but the time complexity is high. Thus, maintaining the high quality of haze-free images while ensuring low time complexity has always been a challenge for image defogging. We accordingly propose a multivariate-optimized haze removal method based on hazy image degradation model.MethodUnlike traditional optimization methods that fix other physical quantities while optimizing a physical quantity, the proposed method considers the correlation between the atmospheric light and transmission mentioned in the degradation model. The atmospheric light and transmission are treated as a whole, and we present a multivariate-optimized haze removal method by iteration. The multivariate-optimized haze removal method can optimize atmospheric light and transmission at the same time. Iterated dehazing is an optimization process that can continuously ensure haze-free images. We provide constraint conditions for multi-threshold fusion on the basis of the statistical characteristics of haze-free images to keep the restored image real and natural. The constraint conditions are utilized to control dehazing degree, and haze-free images are then obtained.ResultHaze removal experiments on images show that the performances of the proposed method in terms of visual effect, color histogram, information entropy, and time complexity are better than those of other classical dehazing algorithms. In the subjective aspect, unlike other fog removal methods, the proposed method obtains more natural and real images. The proposed method can keep considerable image details and structure without color distortion. The halo effect and artifacts are considerably reduced in the images by the proposed algorithm. The intensity of restored images is highly preserved. In the objective aspect, the color histogram in hazy images presents a convergent distribution, and the color intensity is high because of fog cover. After removing haze, the color histogram becomes evenly distributed, and the color intensity is enhanced. Thus, the color histogram of dehazed images should be similar to the color histogram of hazy images with a structural shape. The color histograms of our resultant image and the haze image are highly similar in shape. The image information entropy of the result image is relatively high. For images of Cones, Herzeliya, House, and Dolls, the image information entropies are 13.801 270, 15.490 912, 15.395 014, and 16.276 838, respectively. We also evaluate the time complexity of our algorithm. Our method is three times to five times faster than haze removal methods that optimize transmission via soft matting.ConclusionThis paper presents a multivariate-optimized haze removal method based on the hazy image degradation model. The proposed method not only can optimize multiple variables but also can achieve low time complexity. Constraint conditions for multi-threshold fusion on the basis of the statistical characteristics of haze-free images are developed to control iteration numbers, which contribute to the robustness of the method. Extensive experiments show that the proposed method can obtain satisfactory results in both objective and subjective aspects.