摘要:Image segmentation is one of the most important research contents in the field of image processing and is widely used in real life. Most of the involved models that are based on PDE or calculus of variations are non-convex, so they are easy to get into local minimums, and most of these experiment results which we get are not satisfactory. Besides, the calculation time of these models is too slow to meet the actual demand. Therefore, according to the background removal model and the regional fitting method, we proposed a new image segmentation model in this article. Firstly, following the principle of the background removal, we did some reforms to the original background removal model. With the application of region-scalable fitting method and Heaviside function we get a new region-scalable fitting background removal model. However, the improved model here is not a convex model, and cannot get the global minimum solution, so we make convex optimization to the improved model to get a convex model to solve this problem. Finally, by using the Split Bregman method and level set method, the global minimum solution of the model can be obtained. Comparing with ICV(improved Chan-Vese) model, LK(Li-Kim) model and CV(Chan-Vese) model, several numerical experiment results show that the proposed model in this article has a better performance on image segmentation. Meanwhile, the experiment also demonstrates that the proposed model in this article is more efficient than RSF(region-scalable fitting) model in the case of similar segmentation results. Finally the experiment results also show that different initial positions have little effect on image segmentation results which demonstrates that our model is low sensitive to initialize contour curve. When dealing with the MRI images and synthetic images, the model presents in this paper can not only obtain good segmentation results, but also has a high efficiency on segmentation. The experiment results also show that the model in this article is robust.
摘要:This paper proposes a cross dual-domain filter to denoise multi-modal images and to address the poor performance of the time domain join filtering algorithm for multi-mode images. To maximize the use of complementary information at the edges of a multi-modal image, the proposed algorithm uses a cross bilateral filter in the time domain. Afterward, the residue is filtered using a wavelet shrinkage algorithm in the frequency domain to recover the detail texture information. The two domain results are then superimposed. The iteration between the time and frequency domains is constructed on this basis. The end result is progressively obtained by iteration over the adjusted parameters. Compared with the current noise-filtering methods that are based on either the single domain or single mode, the proposed algorithm shows great advantage in terms of visual quality, edge sharpness, and detail, and PSNR indicator. The proposed algorithm can effectively use the complementary information between multi-mode images and combine the advantages of both the time and frequency domain methods. The negative ring bell effect can be effectively suppressed through the iteration.
摘要:Along with the rapid development of network technology, digital images, as the main carriers of information expression, have been widely used in commercial, economic, aerospace, military, defense,and other fields. However,they entail the risk of information leakage in image storage and transmission. The security problem of image data must be solved effectively. Moreover, some image algorithms have poor self-adaptability and weak keys. This paper proposes an adaptive encryption algorithm of blocking diffusion based on the ChaCha20 hash operation (BDCH) to solve these issues. First,BDCH translates a plain image into a square image by filling the chaotic sequences that are generated using a piece wise linear chaotic map (PWLCM). Second, a 512-bit initial key and the sum of plain-image pixels are used as input tothe ChaCha2 hash operation to generatean initial 8×8 hash matrix. The matrix is then combined with a chaotic sequence through a PWLCM map, whose inputs consist ofthe average of the initial hash matrix, initial key, and normalized average of plainimage pixels to form an 8×8 hash key matrix. Third, permutation and diffusion are simultaneously implemented on the whole image using an Arnold map and PWLCM.The image is then segmented into an 8×8 array of non-overlapping blocks. In the end, the hash key matrix is used to diffuse the blocks with two rounds and to finish the encryption process. The BDCH algorithm can solve the issue of weak keys by co-producing a hash key matrix using the initial hash matrix and the PWLCM chaotic sequences, thereby increasing the key space. Plain-image information is also a part of the input keys of ChaCha20, Arnold map, and PWLCM chaotic map, thereby enhancing self-adaptability and plain text sensitivity. The simulation results and performance analysis on seven gray and color images show that the BDCH algorithm achieves a larger key space, higher sensitivity to both key and plainimages, and faster encryption speed. The BDCH algorithm,which combines simultaneous permutation, diffusion, and hash key matrix blocking nonlinear diffusion,can effectively resist various attacks and is suitable for all kinds of grayscale and color images.
摘要:Image dehazing is inherently an ill-posed problem that involves the extraction of interesting targets from a static image or a video sequence. Such technology is expected to attract wide application in high-level image processing and visual engineering. Transmission estimation is the primary task in single image haze removal. The inhomogeneous fog distribution in a degraded image can lead to false estimations in the transmission estimation process. This paper proposes an image dehazing method that uses prior visual information loss. Given that a hazy image in a natural scene generally exhibits low contrast and chromatic distortion, we ignore the transmission estimation and instead solve the optimization problems of the information loss function. First, the proposed method divides hazing images into three vision areas according to fog density. Second, the loss function, which is built based on the visible characteristics of hazy images, solves the local minimum transmission via the stochastic gradient descent method. Third, the divided dehazed areas are fused via multi-scale illuminance image segmentation with a linear filter. Fourth, the scene albedo is recovered by employing an atmospheric scattering model that uses global transmission. The proposed and existing dehazing methods are qualitatively and quantitatively evaluated to assess their image dehazing performance. The experimental results show that the proposed algorithm effectively removes haze from the degraded image and achieves higher-quality, halo-free, and detailed restorations than the existing dehazing methods. On the average, the proposed method achieves 20% higher-quality restorations than the classic haze removal algorithms. The proposed method significantly enhances image visibility and demonstrates better image haze removal performance than the existing dehazing methods. Compared with the state-of-the-art method, the proposed algorithm is more successful in recovering images from moderate to thick foggy areas and is faster in real-time dehazing applications. The multi-scale and patch-based structure of our method allows us to reduce the running time in neighborhood operations. Future studies can use this method to improve prior knowledge on the effective evaluation mechanism of image dehazing.
关键词:image dehazing;visual information loss prior;optimization problem;transmission separation;human visual perception
摘要:Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism. The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model. Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video. This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.
摘要:Inductive loop sensors are commonly used to detect traffic violations. However, these tools are expensive and difficult to maintain. Tracing vehicles or detecting violation by video analysis represents an alternative way to exploit computer vision. Thus, the calibration of intersection driveways, which is the basic task of the intelligent transportation system in the detecting traffic violations, must be primarily addressed. This paper presents a solution for extracting intersection backgrounds and marking out driveways. First, we propose a new background extraction algorithm, which inherits the features of both mean and frame difference methods, to solve the problem of lane markers that are partly covered by vehicles. On the one hand, this algorithm exploits the average image to estimate an extra multiplying power that keeps the background relatively stable. On the other hand, this algorithm calculates the frame difference and uses such difference to update the background progressively. Thus, the proposed algorithm achieves fast convergence when the traffic runs smoothly and can be quickly updated during traffic hours. Second, several lines are detected by employing Canny and Hough. Based on perspective transformation, clustering analysis, and prior knowledge, a filtering mathematical model is proposed to detect the driveway from these lines. Third, the proposed algorithm is verified by conducting experiments. The proposed algorithm can obtain a more robust outcome than the Gaussian mixture model with five Gaussian distributions, which is one of the most widely used background extraction methods. Using the manual background as ground truth, the proposed algorithm can quantitatively compare the gray value of the ground truth's pixel with the corresponding extracted background's pixel. The result of the experiments shows that the accuracy rate of background extraction is 20% and 30% more than that of the mean method and the traditional Gaussian mixed model, respectively. According to the cycle data of traffic lights, the ghost can be avoided when the vehicles stop at red light. In other words, the proposed algorithm can distinguish between a temporary stopped objective and a long-time stationary background. Similarly, the calibration method can precisely determine the pseudo lane lines using a clustering or filtering strategy and produce a reliable result. The proposed algorithm has several merits, such as fast convergence rate, higher accuracy rate, and excellent stability, which can rapidly erase the virtual shadow of the vehicle. In addition, calibration can be accomplished in the daytime as a one-time effort. Thus, the proposed algorithm is suitable to calibrate driveways with normal lights. The experiments also demonstrate the effectiveness and practicality of the proposed method. However, the algorithm still requires further optimization and analysis in adjusting parameters. In the future, we will attempt to devise a new method to select the suitable parameters adaptively by adopting some machine learning approaches.
摘要:In target tracking, the particle filter and frame-by-frame updating of the model both have poor robustness in solving the problems of occlusion, illumination change, and self-rotation. To address these challenges, we propose a new visual object tracking method based on selective model updating without timing. Forcefully updating the model on a regular basis results in target distortion and tracking drift because such updates do not consider occlusion and other background interferences. Thus, we select a mechanism to ensure that the updated model is valid and accurate. Frame-by-frame detection can also extend the tracking time and reduce video tracking efficiency. We detect the object changes within a short period to prevent the model updates from significantly affecting real-time tracking, which is of practical significance. We detect the target changes regularly based on particle filter and use the steepest gradient descent method to determine the update time. The steepest gradient descent method can be used to determine whether or not the threshold point of background interference can be reached by comparing the pixel information errors of the target, the initial model, and the model. The model is intelligent enough to represent the target, whereas the tracking frame is closer to the ground truth than the other algorithms. The proposed model is also superior to others in terms of center position error, coverage, accuracy, success rate, and time. Thus, the problems of occlusion, illumination change, and self-rotation can be solved by updating the proposed model selectively. The proposed method demonstrates excellent robustness in various scenarios under the scale-invariant condition because the dimension is not considered.
关键词:object tracking;particle filter;the steepest gradient descent method;determine the update time;selective;aperiodical model update
摘要:LBP has been widely applied in texture classification and face recognition as a kind of texture description operator for its simplicity and high efficiency. Given that the features that are extracted by the basic LBP and its variant-LDP operator are sensitive to noise, only the symbol information of the difference among local pixels is used for encoding. The binarization method is too simple to extract adequate texture feature information. Thus, this paper proposes a face recognition algorithm based on LDP through polarization encoding. First, the first-order derivatives along the 0°, 45°, 90°, and 135° directions are obtained. Second, the Stokes vector of the face image is built. Third, the texture feature of the face image is extracted from multiple directions. Fourth, following the encoding method of the azimuth of polarization, each sub-block histogram vector with varying weights is calculated according to the image entropy to constitute the final face feature vector via cascading. The experiments obtain correct recognition rates of 97.4% and 92.22% in the ORL and YALE face databases, respectively,the used time is almost the same with LBP and LDP algorithm. When the sample size is large, the complexity is lower than LBP method. In the presence of gaussian noise and salt and pepper noise, we respectively obtain correct recognition rates of 93.88%, 86.27% and 96.13%, 84.71%, they are much higher than LBP and LDP algorithm. The proposed algorithm based on polarization encoding can extract more discriminating texture features and achieve a high face recognition rate even in the presense of noise. This algorithm also has some reference values for texture classification and object recognition in other fields.
关键词:face recognition;texture feature;angle of polarization;local derivative poattern;local binary pattems;histogram
摘要:Given that the fracture situation affects the results and reliability of straight line matching, a new line matching method for close-range images under multi-conditions is proposed. First, the initial corresponding points are obtained using the scale-invariant feature transform algorithm and are optimized by random sample consensus. The affine transform matrix is computed based on the final corresponding points. The dense matching accuracy is improved through affine transform, Harris interest value, and least square method based on the constructed points. Second, the lines are extracted using the Freeman chain code priority algorithm, and the initial matching results are obtained based on the position relationship between the dense matching points and the lines in the searching area. Third, the initial corresponding lines are optimized by the line coincidence degree, and the endpoints of the extracted lines are determined under epipolar constraint. s A set of close-range images that contain rotation, scale change, and occlusion are used in line extraction experiments. The experiment results show that, compared with other line matching methods, the proposed method successfully matches 1.07 to 4.1 times more lines and improves the accuracy of straight line matching by 0.6% to 53.3%. The proposed method also outperforms the existing methods in terms of accuracy and robustness. By setting multi-conditions, the searching area of line features is effectively decreased in stereo image matching, thereby significantly improving the matching efficiency. Experiments are performed on close-range images under different geometric transformations. The proposed method can be used to solve line break and occlusion problems.
摘要:This paper proposes a new method based on sparse optical flow to address the problem of target extraction and tracking in dynamic backgrounds. First, the pyramid of the LK optical flow method is used to generate an optical flow image to match the feature points between two images. Second, the feature points are divided preliminarily based on the optical flow information on the displacement and direction of the optical flow image. Third, the center iteration method is applied to remove the noise feature points that do not belong to the target motion area. Fourth, the maximum intersection of the target feature points in the first frames leads to the stable target points that are tracked in the subsequent frames. In the case of blocked targets in subsequent frames, we apply the Kalman estimation method and introduce a blocked coefficient related to feature points to predict the target location and locate the target quickly upon its reappearance. The experimental results prove the capability of the proposed algorithm to accurately locate the target. The false detection rate of the target feature points is reduced by 10%, and the tracking rate reaches as high as 97% even when the target is blocked. The proposed method demonstrates excellent performance in meeting real-time requirements in dynamic backgrounds and can be applied to tracking slow- or fast-moving targets in blocked or unobstructed scenes.
摘要:In visual object tracking, the state of the target in every video frame is linearly represented using several online learned templates. The modeling ability of the tracker greatly depends on the generalizability of the template data and its error estimation precision because of the complex interference factors that are caused by the target itself or the scenes. Many existing algorithms have been used to represent the samples in vector form and to change factitiously the original data structure such that the natural relationship between each data pixel of a sample is extremely damaged. In addition, such data expression mechanism may enlarge the data dimensionality that significantly intensifies the computational complexity and wastes much resources. This paper investigates the data representation and observation modeling mechanism of the video tracking framework and provides a more compact and effective solution based on multilinear analysis. In our framework, the candidate samples and their reconstructed signals are expressed in tensor form to maintain the original structure of the data. When the tracker outputs the candidate appearance models, the modeling tasks of the tracking system are organized using the excellent multilinear characteristics of the tensor structures. The objective function is regularized using the tensor nuclear norm and the L norm in order to excavate fully the independences and interdependences of the observation models with a multitask state learning assumption. The structured tensor form used in the data prototypes and observation models can effectively address the data representation problems and computational complexities in the tracking system. This form also provides a more simple and effective solution for the multitask joint learning of the candidate appearance models. When the tracker meets any destructive noise interferences, its tensor nuclear norm constraint mechanism of error estimation in a multitask joint learning framework fully excavates the most comprehensive information of the target, thereby allowing the tracker to adapt to various visual information changes that result from intrinsic or extrinsic factors. The experiment results on several challenging image sequences demonstrate that the proposed method achieves more robust performance in object model representation. Therefore, the average center location error and the average overlap rate of tracked image patches in all image sequences is reached better results (4.2 and 0.82 respectively) compared with several state-of-the-art tracking algorithms. Extensive experiments are performed to validate our algorithm. The tensor nuclear norm regression model and the error estimation mechanism of our algorithm can achieve the most desired candidate states that are greatly similar to actual object states in real time. The tracker strictly detects the true state of each candidate in the multitask learning framework, thereby providing a better solution to the model degradation and drifting problems.
摘要:Traffic is an important factor in any city. The urban road network becomes more complex along with the development of urban traffic. Such complexity leads to visual disturbance, affects the line planning, interrupts the transfer of passengers from urban to metro buses, results in rapid positioning, and many others. Thus, line deformation has become an important research topic in the study of information visualization. To address this issue, we propose an automatic layout deformation algorithm of a formal road system based on constraint rules to improve the traffic network. Thus, users can rapidly find the routes to their destination points on the actual traffic map and reflect those routes on the actual map easily using our method. The key points are extracted by preprocessing actual map data to generate an initial line map that is used as layout. Afterward, the constraint rules of the angle rule, side length rule, and non-overlapping rule are implemented using a force-directed algorithm. Multi-objective optimization is then performed using the hill climbing algorithm to complete the task of restricting the line direction. We perform a subjective evaluation by employing the questionnaire survey method and use Road Net of Hefei City in China and Retro Map of Mexico as actual cases. The results of the experimental and comparative analysis reveal that around 69.6% of all users acknowledge the importance of legibility, aesthetic, convenience, and practicality. We perform a user planning route experiment in which the starting point is the same as the destination point. Each group of participants save 26.2% of their time in the user planning path tests. The line deformation that is proposed based on the constraint rules and changes can transform a complex urban road into a simple line map that can be easily understood by individuals. This technique eases the contradiction between complex urban circuits and the limitations of human memory and effectively prevents visual noise from complex urban road networks. This method is suitable for route planning, transferring between urban and metro buses, rapid positioning, and so on, thereby improving road efficiency and saving a large amount of time.
摘要:Free viewpoint video (FVV), an emerging 3D video technology, is currently gaining popularity. As its primary advantage, FVV allows audiences to enjoy the video service of an arbitrary visual angle and feel a vivid stereoscopic visual sense as though they are part of the scene they are watching. Depth image based rendering (DIBR), which can synthesize virtual views with the referenced views and associated depth information that are randomly located on terminal display devices, is a key FVV technology to generate 3D views. However, this technology cannot satisfy real-time needs, and the noise that appears in the synthesized images can lead to visual discomfort because of the imperfect depth maps. Given the poor objective and subjective qualities of the virtual views, this paper proposes a novel 3D-warping algorithm for DIBR to increase the rendering speed as much as possible. This study makes three main contributions. First, to reduce the computation time in 3D-warping, the mapped table for converting depth to parallax is introduced before the rendering. During the rendering process, the programs search the mapped table to obtain the parallax and to avoid repetitive operations. Second, the depth maps of the referenced views are divided into similarly sized square blocks. We adopt block-based 3D-warping, which requires only one instance of warping, if the pixels in the same block have the same depth. Otherwise, we employ the pixel-based 3D-warping, which must conduct warping one by one. The block-based 3D-warping saves much time in decreasing the mapping times of pixels. Third, we propose a corresponding modified interpolation method for block- and pixel-based 3D-warping. The improved interpolation method combines nearest interpolation with splatting interpolation along the horizontal direction. Nearest interpolation is performed if the mapped pixels are near the pixels to be interpolated. Otherwise, splatting interpolation is performed. The proposed interpolation method also betters the Z-Buffer technology in depth direction, which discards the mapped pixels if they are located too far from the cameras and calculates the accumulative average value based on depth value otherwise. Extensive experiments show that the proposed algorithm not only saves 57.81% time on the average compared with the integer-pixel synthesized scheme of VSRS3.5 but also enhances the PSNR and SSIM-two indexes by 0.355 dB and 0.001 15 dB for evaluating the objective quality of the virtual views. The proposed algorithm is highly suitable for DIBR's integer-pixel rendering in a parallel camera system and is especially effective for video sequences that have many flat areas in associated depth maps. This algorithm not only accelerates the 3D-warping process but also improves the objective quality of the virtual views.
关键词:free viewpoint video(FVV);depth image based rendering(DIBR);rendering;virtual view;3D-Warping;interpolation
摘要:This paper proposes a novel evidential reasoning based region growing (ERRG) method to solve the segmentation problem of an interactive medical CT image. ERRG considers some important features of medical images, such as gray histogram, Gabor, and gray level co-occurrence matrix. The Bhattacharyya coefficient is used to measure the similarity between the adjacent pixels and the utility function and to merge the metric coefficients. However, given the low efficiency of ERRG, a parallel region segmentation algorithm for interactive medical images is mapped to GPU to accelerate the algorithm. The true-positive fraction (TPF) can significantly increase, false-positive fraction (FPF) can significantly decrease, and the speedup is 12. Real-time interactive medical image segmentation can be achieved using GPU-accelerated.
摘要:Traditional classification technologies cannot easily or accurately determine the spatial distribution of ground features for hyperspectral images because mixed pixels are widespread throughout the image. Sub-pixel mapping technology is an effective tool to solve this problem. The existing sub-pixel mapping methods that are based on linear optimization encounter two issues in their practical implementation: their inexact objective functions and their excessive computation. This paper proposes a new sub-pixel mapping method to solve the aforementioned problems. The algorithm framework is constructed by combining spectral unmixing with binary particle swarm optimization. The numbers of sub-pixels for each pixel are estimated according to the results of spectral unmixing. The regional perimeter is modified by analyzing the influence on the perimeter and region number as induced by some special cases, such as isolated point or regions that include only two points. The cost function is formulated by considering the regional perimeter and number of connected regions. To reduce the running time of the algorithm, global analysis is replaced with local analysis according to the feature space distribution characteristics, and a new iterative optimization strategy is proposed. Compared with directly minimizing the region circumference based on the image chain code, the modified object function emphasizes the boundary of most regions and does not yield any isolated points or regions that include only two points. The method also improves the recognition rate by more than 2% and the Kappa coefficient by more than 0.05. Moreover, the new iterative optimization strategy nearly halves the CPU time. The experimental results show that the proposed algorithm can improve the mapping accuracy and that the proposed optimization strategies can accelerate the mapping. Given the weak spatial correlation in areas where the end members are uniformly mixed, the proposed algorithm is suitable for hyperspectral images without uniformly mixed areas.