摘要:Light field photography is commonly used to solve computer vision problems. However, the study of light field photography-based computer vision is stillat the initial stage and a comprehensive theory on this issue has yet to be proposed. This paper presents a survey of light field photography on the context of computer vision, including 1) mainstream light-field cameras and the merits and demerits of their application for computer vision tasks; 2) calibrating, decoding and pre-processing schemes for light-field cameras; 3) the techniques of rendering and reconstructing images from 4D light fields and the contributions of these techniques to computer vision tasks; and 4) the methods for extracting features or constructing descriptors for computer vision tasks from 4D light fields. The advantages and disadvantages of light field photography-based computer vision are discussed. Finally, the key problems in advanced research of light field photography and related computer vision are proposed with some potential approaches. The conclusion of our study is that light field photography provides a novel view for computer vision research;thus, this method can attract more attention from both the academe and industry.Finally, light field photography and computer vision are closely related and collaboratively developed.
关键词:light field photography;compute vision;4D light field;refocusing rendering;depth estimation
摘要:The development of new disciplines in the field of mathematics lies in promoting research on encryption technology and cryptography. The semi-tensor product, a new mathematical tool, is a generalization of the traditional matrix multiplication; it provides a new approach by which high-dimensional matrix digital signal processing can achieve a different dimension. In this paper, a novel image-encryption algorithm based on the semi-tensor product is proposed. The content of the plaintext is used as a key parameter. Then, a small reversible key matrix is constructed by using the Kronecker product; the key matrixis then used to change the values of pixels in the original image by applying the semi-tensor product. As a result, the dimensions of the original image are much larger than those of the key matrix. Experiments are performed by using an 8×8 key matrix featuring images of various sizes. After comparing the experimental results with those of previous methods, the proposed method demonstrated a high level of security with a suitable processing performance. A small encryption matrix is proposedto encrypt and decrypt images, wherein the dimensions of the original image are larger thanthose of the key matrix. The computations of the data are effectively reduced, and the operational efficiency of the encryption process is enhanced. The experimental results also demonstrate that the proposedalgorithm offers secure information protection and satisfies the processing time required by standard applications.
摘要:In the process of debugging a C/C++ image-processing program, the debugger must have the ability to visualize the in-memory image data. Given that no study has yet to present an image debugger visualizer that can work in multiple-operation systems, an open-source, cross-platform image debugger visualizer design is proposed to fill the research gap. We utilize the Python interface of the GDB debugger, translate the byte array fetched from the debugged program to a Python two-dimensional array object, and visualize the array by using the Matplotlib library. The visualization process is executed in a worker thread, along with the traditional GDB text-based interface. Experiments are performed on Windows, Linux, and Mac Systems. Results show that the debugger has various features, such as zooming, panning the image, showing the pixel value of the specified position of the image, saving the image, and keeping the GDB text command line interface in a state where it can work interactively. The debugger design meets the requirements of developing and debugging an image program in different operating system platforms, thus compensating for the deficiencies of the GDB debugger in terms of image debugging ability, and significantly improving the efficiency of development and debugging.
摘要:To better balance binary feature descriptor algorithms based on manual learning, which have superior real-time performance, and binary feature descriptor algorithms based on optimization study, which have robust performance, this paper presents a binary feature description algorithm for fast optimization screening of a multi-scale rectangular area (referred to as MRFO). The typical workpiece target in satellite assembly is identified. The proposed description algorithm divides images according to the pixel gray value and gradient direction and simultaneously smoothens each sub-image with different Gaussian kernel functions to establish a multi-scale image set. Candidate rectangular areas are rapidly extracted through the constraint condition from the sub-image of multi-scale or the multi-scale feature point neighborhood. In the training phase, the proposed algorithm calculates the score and optimal threshold of candidate rectangular areas by using optimization study and selects the subset that has strong distinction and low correlation. In the testing phase, the proposed description algorithm calculates the response value of selected rectangular areas in the multi-scale feature point neighborhood and employs the optimal threshold for binarization to constitute the binary description vector of feature points. The experiment proved that the binary feature description algorithm for fast optimization screening of a multi-scale rectangular area demonstrates a robust performance based on the ROC curve and the recall rate statistical result under the condition of 80% accuracy rate. The average accuracy is higher by 8% to 12% than that of compared algorithms. In real video images, the proposed description algorithm can identify the typical workpiece target accurately. The experiment also proved the superior real-time performance of the proposed algorithm; the execution time in the training phase is only 4.35% of the traditional optimization learning algorithm (only slightly higher than that of the binary description algorithm based on manual learning in the testing phase). Compared with the traditional scheme of binary feature description algorithms and float feature description algorithms, the proposed feature description algorithm can overcome interference from all types of perspective, scale, rotation transformation, and influence of similar background information. The proposed algorithm can identify the typical workpiece target accurately. The algorithm can help improve the accuracy and efficiency of satellite assembly and enhance the automation level of domestic related industries. The universality of the method is strong, and it has a good application potential.
关键词:target recognition;feature description;optimized study;fast screening;multi-scale rectangular area
摘要:Computing salient information within digital images has been the focus of considerable research interest and has been applied in many applications, e.g., object detection, image segmentation, visual tracking, and content-based image retrieval. In most models, image saliency has been regarded as local regions that can be easily differentiated from their surroundings. However, the main problem existing in the models is that salient regions have limited ability to locate a complete object. A method that combines regions and edges is proposed to solve the problem. For regions, an isophote-based operator is designed to detect potential structures. According to the findings from neurobiology and psychophysics, salient information can be defined as regions that popped out from their surroundings based on certain feature channels, such as color, intensity, and orientation. Thus, the isophote-based operator is employed to extract salient regions from three kinds of features, namely, color, intensity, and orientation. The operator mainly establishes a consistent measurement for various feature channels, which easily integrates multi-feature saliency. For edges, a global saliency detector is adopted with the multi-scale Beltrami filter. The multi-scale Beltrami filter could enhance edge information within images while blurring detailed information of interior regions. By processing through the multi-scale Beltrami filter, global image saliency detectors could locate salient edges easily. Finally, the salient regions and edges are integrated by a linear method directly. The database used in this study includes 1 000 images from a variety of sources and has ground truths in the form of accurate human-marked labels for saliency information. Two kinds of measurements are adopted in the experiment, namely, segmentation by fixed thresholding and segmentation by adaptive thresholding. In both measurements, the proposed method exhibits impressive performance compared with nine other well-known methods, with 0.92 rate of receiver operating characteristic area in segmentation by adaptive thresholding and 0.5905, 0.6554, 0.7470 rate of average precision, recall, F-measure in segmentation by adaptive thresholding. Image saliency is one of the key features for many applications. This study proposes a novel method that combines region and edge information to locate complete salient objects. As shown in the experiment, the proposed method has good applicability and robustness.
摘要:Many implicit surface reconstruction methods mainly focus on coping with noisy point cloud, but hardly with outliers. Implicit surface reconstruction methods generally require that a point cloud normal is accurate and oriented. However, normal estimation itself is a complex problem for a point cloud with defects laden, such as noises, outliers, low sampling, or registration errors. Hence, a new implicit surface reconstruction algorithm using Voronoi covariance matrix is proposed Poisson surface reconstruction with an indicator function (defined as 1 at points inside the model and 0 at points outside)is used to represent the reconstructed surface. The oriented point samples can be viewed as samples of the gradient of the indicator function of the model. The problem of computing the indicator function is reduced to finding the scalar function, whose gradient best approximates a vector field defined by the samples. Poisson surface reconstruction is robust to noises, even with a few outliers. However, it requires point cloud with oriented normal. The Voronoi covariance matrix can be computed by integrating the domain of its Voronoi cell. The eigenvector of the largest eigenvalues of the Voronoi covariance matrix adequately approximates a normal surface. The anisotropy of the Voronoi cell, which represents the confidence of normal estimation, is the ratio between the largest and the smallest eigenvalues. TheVoronoi covariance matrix is used in this study instead of point cloud normal to avoid the difficulty of normal estimation based on Poisson surface reconstruction. The differential equation of implicit function is also formulated, which satisfies the requirement that the function gradient must be aligned with the principal axes of the Voronoi covariance tensor field. The differential equation is solved in a discrete exterior form. As a result, the surface reconstruction problem is reduced to a generalized eigenvalue problem. Cholesky factorization of TAUCS library and Arnoldi iteration of ARPACK++ library are employed to solve the generalized eigenvalue problem of sparse and symmetrical matrices. When dividing point cloud space, the length of the shortest tetrahedral edge is limited to avoid local space excessive division. The thin shell around the point cloud, which is defined by probability measure theory, is further divided to increase surface reconstruction accuracy. Delaunay refinement is used to extract isosurface. Compared with the traditional marching cube method, Delaunay refinement guarantees that reconstructed surface is homotopic to implicit surface and is more flexible in terms of resolution. Experiments show that our algorithm can reconstruct the surface well from point cloud even with noises and outliers. Triangle mesh of different resolutions can be generated by adjusting the Delaunay refinement parameter. The fitness between reconstructed surface and point cloud is related to the fitness parameter, and different parts of the surface can be distinguished by adjusting the fitness parameter. However, the bottleneck of the algorithm lies in the high memory requirement and time-consuming Cholesky factor for large models. A new implicit surface reconstruction method using Voronoi covariance matrix is presented. Results show that the algorithm is practical and can robustly reconstruct surfaces with good aspect ratioand without point cloud normal.
摘要:Non-uniform rational B-splines (NURBS) refers to a unified mathematical method for the free type of curves and surfaces. This method is invariant under common geometric transformations, such as translation, rotation, parallel, and perspective projections. The B-spline model has wide applications in the field of computer-aided design, such as determining whether two surfaces splice or not. This phenomenon depends on whether there are matching curve segments based on contour lines of the surfaces. Therefore, mosaic fragment reconfiguration can be converted to the optimal matching of the curve combinatorial problems. This paper applies the proposed method to discuss the problem of building a similar combination curve, which filters primitives from a curve library (contour set) by optimizing matching combination for an expectation curve (contour). Kabsch algorithm is a method for calculating the optimalrotation matrixand translation vector that minimizes the root mean squareddeviation (RMSD) between two paired sets of points. In this paper, the paired sets of points are respectively extracted from two curves described by the NURBS model. The minimum RMSD of the two curves is obtained via the optimal translation and rotation matrix transformation based on the Kabsch algorithm. If the minimum RMSD is not greater than the index of similarity, the two curves are assumed to be similar and can be superimposited through the abovementioend rotation and translation transformation. Finally, an NURBS curve optimal matching combination method is proposed with the binary search algorithm. In terms of satisfying the matching similarity conditions, the method can minimize the number of expectation curve segments. We assumed a 3D curve library exists for NURBS curves, and all the weights of the control points are set to 1. The index of similarity is set to 0.025, and the smallest search step is set to 0.05. According to the proposed optimal matching method, the expectation curve is divided into ten sections that are respectively similar to the corresponding primitives from the curve library. The combined curve containing ten matching primitives is similar to the expectation curve. This paper presents a new method of NURBS curve optimization matching combination. Experimental results show that different expectation curves can be obtained based on the matching combination curve. The proposed method can be applied to solve that problem of fragment spliced reconfiguration. If the index of similarity is smaller, this means there are more expectation curve segments that could be possibly divided.According to the actual situation, selecting the appropriate index of similarity is necessary. The effectiveness of the proposed method is verified by the experiment on a 3D curve matching optimization combination.
关键词:NURBS Curve;Kabsch algorithm;index of similarity;optimal matching;combined curve
摘要:The classification and recognition of images play an important role in a number of applications, such as image retrieval, object detection, and video content analysis. Nowadays, a major breakthrough has been obtained based on deep convolution neural network (CNN) model, which has surpassed state-of-the-art methods for image classification and recognition, because the features extracted by CNN models are more discriminative and contain more semantic information than the traditional approaches. However, such CNN models as Alex-Net and ZFCNN-Net are extremely simple and incapable of extracting more information for representing images, while other models such as VGG16/VGG19 and GoogLeNet always have a huge number of neurons and parameters. In this work, a novel model named deep parallel cross CNN (PCCNN) is proposed, which can extract more effective information from images and has less neurons and parameters than other models. Inspired by the mechanism of human vision, which has two visual pathways and optic chiasma, the proposed PCCNN is designed based on the Alex-Net, which extracts two groups of CNN features in parallel through a couple of deep CNN data transform flows. Moreover, after the first fully connected layers in each stream, the information of two streams are fused together. The fused information is forwarded to the next two fully connected layers, and then the output information is fused again for more power representation features. Finally, for image classification, the Softmax regression function is employed with a 1024D image feature vector from the fusion of the two feature groups. Note that Alex-Net is used as the base model because of its simple architecture and its need to use fewer neurons. In the PCCNN model, the first stream is the original Alex-Net, and in the second stream, 6 instead of 4 is used as the stride in the first convolutional layer. The larger stride in convolutional layer has worse performance if only a single stream is used because of the greater number of missing information. However, when the two streams are combined, the proposed model has better performance than all the other models. In addition, because a larger stride is used in the second stream, the feature maps are smaller, and the number of neurons and parameters are not greatly increased. Some popular public datasets, namely Caltech101, Caltech256, and Scene15, are selected to evaluate the performance of our model. At the same time, some state-of-the-art models are implemented with the same settings for comparison. Experimental results demonstrate that the proposed PCCNN model achieves better performance in terms of image classification than these models, indicating that the features extracted with the PCCNN model are more discriminative and have stronger presentation ability. On the Caltech101 dataset, the accuracy reaches approximately 63% on top1 with PCCNN model, exceeding that of the VGG16 model by about 5% and that of the GoogLeNet model by about 10%. Moreover, in terms of the Caltech256 dataset, our model also has better performance than the other models with accuracy of 46.4% on top1, surpassing those of the VGG16 and GoogLeNet models by 5% and 2.6%, respectively. However, our model has worse performance on Scene15 dataset than GoogLeNet, but still has higher accuracy than when only a single Alex-Net is used. The proposed PCCNN model has better performance than several state-of-the-art CNN models in terms of image classification and recognition, particularly on the medium-scale datasets, but on the small-scale dataset, the proposed model does not exhibit better performance. Hence, the model should be further tested on large-scale vision tasks, such as Imagenet or SUN dataset, which is the next work that the authors are planning to do. In fact, the PCCNN model is not only applicable to image classification and recognition, but it can also provide a novel thinking methodology for deep CNN model designing. In the deep CNN model, the deeper the architecture is, the more neurons and parameters exist, and the complexity also significantly increases. Thus, increase the width of the model can be increased to match the features and obtain better performance. Although this method also leads to an increase in the number of neurons and parameters, the rate of increase is slower than when more layers are added in the single model; furthermore, the model is more in line with the human visual physiological mechanism. Finally, the PCCNN model had great extendibility.
摘要:With the steady growth of the amount of images publicly available on the Internet, adult image recognition is of great significance for ensuring web security and content monitoring.In this paper, single-person adult pictures are studied. Current skin-based adult image recognition algorithms usually have a high false positive rate. Here, an adult image recognition algorithm, which takes the image of an adult torso as the region of interest (ROI), is proposed to reduce false-positives efficiently. The proposed algorithm utilizes Poselet to detect the ROIs with plenty of discriminative information. Each Poselet provides examples for training a linear SVM (support vector machine)classifierthat can be run over the image in a multiscale scanning mode, after which the outputs of these Poseletdetectors vote for the localization of the torso and other body parts. Then, based on the ROIs, discriminative Fisher vectors for nude breast image classification are obtained. However, owing to variations in human body appearance, the ground truth positions and the outputs of the torso detectormay shift. An adaptive algorithm is proposed to overcome such a weakness. The algorithm selects several torso candidate areas according to the confidence values obtained by the torso detector; the algorithm then integrates the discrimination results of several areas to obtain the final result. To train the SVM classifier based on torso detection, a set of 30 000 pornographic images was collected,and the pornographic regions in the images are manually labeled. The labeling information can be used to generate the training data automatically. To evaluate the method, a new and large dataset is built, which includes adult, benign, and bikini images. Experiments on this dataset reveal that the proposed method obtains an accuracy rate of 91.7%, which is much higher than the traditional skin color-based method. The Poselet-based torso detection method obtains more pornography-related information compared with other methods. Thus, the proposed method can detect adult images with a high detection rate and a low false positive rate, making it suitable for practical applications.
摘要:Visual object tracking has advanced significantly in recent years. However, variations in appearance because of changes in scale, motion, shape deformation, or occlusion continue to pose challenges to object tracking. Therefore,an efficient appearance representation method plays a key role in improving the robustness of object tracking. In this paper, a tracking method from the perspective of midlevel is proposed. First,superpixel segmentation is performedon training frames, in which the feature set and corresponding confidence values are taken as inputs and the discriminative appearance model is constructed via feature regression. When the tracking frame comes, the feature set is inputted into the model and the confidence of candidate regions is obtained, thereby separating the target from the background. The algorithm is evaluated using public data sets. Experimental results demonstrate that our algorithm can handle appearance changes, such as variations in scale, position, illumination, shape distortion, and occlusion. Compared with state-of-the-art methods, our algorithm performs well in center error and obtains the best result in carScale,subway, and tiger1 sequences, with average center location errors of 12, 3 and 21 pixels, respectively. Comparedwith the same type of method, our algorithm is more efficient in all sequences and in 32 times of other method in carScale sequence. Experimental results demonstrate the effectiveness and robustness of our tracking method under appearance changes. Only one kind of feature is applied in the proposed algorithm; thus, better features can be incorporated to further improve the tracking results.
摘要:A marine video exhibits large-area textureless regions, and traditional video deshaking methods based on feature point detection and tracking for motion estimation usually have poor effects in processing such kinds of videos. In most marine videos captured in a ship, salient feature points are difficult to detect in regions of dominant water and sky, and the involved wave motions makes it difficult to track the detected few feature points. Thus, desirable deshaking results cannot be obtained by applying traditional video deshaking methods. Alternatively, this paper proposes a marine video deshaking method based on the estimation of steady optical flow or the SteadyFlow. The proposed algorithm is based on hierarchical block matching and integrating some smoothness constraints to compute the flow motion of the corresponding hierarchical blocks, thus facilitating rapid and accurate computation of the approximate optical flow field that exists in the marine video. The hierarchical blocks are typically organized into a pyramid with a few levels of blocks; in each level, the most matched blocks are searched in the neighborhood with the local smoothness constraints. The displacements of the blocks in the finest level form the optical flow motion of the marine video. Such motion estimation scheme is more suitable for a marine video with large regions of water and sky regions. Then, the estimated optical flow is smoothed in a spatially and temporally consistent manner to obtain a visually steady motion, where an energy functional optimization is applied to realize the efficient deshaking operation on the marine videos by using the steady optical flow. The SteadyFlow based marine video is implemented using the proposed deshaking algorithm; this is then tested on many marine video examples captured in a ship, which have obvious shakiness caused by wave motions. For the optical flow estimation, different methods on some public databases that have ground truth optical flow are compared for quantitative comparison. For the visual deshaking effect, a set of shaky marine videos are collected. Then, using such data, the proposed method is run along with other comparative methods and sophisticated software for further comparison. The user study is conducted to qualitatively evaluate the performances of all evaluated methods in terms of marine video deshaking. The experiment results on the running time statistics and visual quality comparison demonstrate that the proposed algorithm does not only efficiently realize marine video deshaking effects, but also accelerate the running speed of the process, thus reducing the timing by up to 70% of the traditional methods. A video deshaking algorithm based on the steady optical flow estimation is proposed to deal with the particular set of marine videos. Compared with some traditional methods and software, the method proposed in this paper is more suitable for processing marine video shakiness because it can obtain more accurate motion estimation by using hierarchical block matching with smoothness constraints to compute the motion field of the marine videos.
关键词:hierarchical block matching;steady flow;marine video deshaking;energy optimization
摘要:Image retargeting aims to automatically adjust the resolution of an image by non-homogeneously sca-ling image content and displaying it in a limited screen with well-preserved salient objects.A novel image retargeting method based on salient object detection is proposed to solve the partial distortion problem of salient objects. The proposed method utilizes the results of salient object segmentation instead of saliency maps to improve retargeting performance. First, a saliency map is generated using saliency fusion and propagation strategy, which can obtain the balance between the accuracy and completeness of salient regions. Then, based on the input image and saliency map, a saliency cut method with adaptive triple thresholding is adapted to segment salient objects, which can generate the salient objects with accurate boundaries. After this step, the curve-edge grid representation of the input image is constructed by finding the eight connected seams with the maximum energy. Finally, the grids are non-uniformly scaled to satisfy the required size. Using manual evaluation, the performance of the proposed method are compared with those of ten typical methods on RetargetMe dataset, a public dataset for image retargeting. Experimental results show that the proposed method can effectively reduce the partial distortion of salient objects in image retargeting and obtain the retargeting results without obvious artifacts on 48.8% images, a rate that is 5% better than the best existing image retargeting method. The proposed image retargeting methods based on salient object segmentation can improve the consistency of salient object processing, reduce the obvious artifacts caused by partial distortion of salient objects, and obtain good performance in retargeting.
摘要:Image retargeting is a technique that can flexibly display images with different aspect ratios on various devices. Among the image retargeting algorithms, the as-similar-as-possible(ASAP) algorithm has a high computing efficiency. In the ASAP algorithm, an energy function is minimized by solving a quadratic problem during 1D parameterization. However, in the ASAP algorithm, a significant part of an image with higher saliency values may be deformed into a small size and the background can be extremely stretched for several images. Based on the original algorithm, we propose an algorithm that can avoid these problems. The original ASAP algorithm uses a quadratic equation to calculate the widths and lengths of the grids of an image. We kept the quadratic format and added a new term to the equation. Aimed at ensuring a compatible change of widths and lengths while resizing an image as well as making each grid as large as possible, the term we added is the sum of the area of the grid, which prevents the image from over-compressing and over-stretching, but retains its efficiency. As for image retargeting quality assessment, the new algorithm exhibits a good performance. We used the quality assessment algorithm as a scoring test to evaluate the similarity between original and resized images. The result shows that the proposed algorithm obtains higher scores than the ASAP algorithm. Without parts of the image being over-compressed and over-stretched, the output images look more reasonable and more information is preserved. Scores also increase, and in several cases, the scores increase by at most 39.0%. The new algorithm not only is an efficient content-aware image retargeting algorithm but also preserves more information from the original image than the ASAP algorithm.
关键词:image processing;image retargeting;content-aware algorithm;as-similar-as-possible algorithm;quadratic problem
摘要:Compression, storage, and transmission of local feature descriptors have shown remarkable importance in a wide range of image and video applications. In this paper, a hybrid frame work of conventional image/video compression and compact feature representation is developed to make the multimedia smarter for retrieval/analysis driven applications. The multiple-reference prediction technique is introduced to remove the redundancy from both spatial-temporal domain and texture feature by efficiently leveraging the information in video coding. This process is done to achieve compact, discriminative, and efficient representation of feature descriptors. Furthermore, the rate-accuracy optimization technique, which targets to optimize the performance in matching/retrieval based applications, is introduced. Based on the extensive simulations on different test databases, the bitrate of visual features is significantly reduced according to the proposed scheme to wards 150:1 compression ratio,while maintaining a state-of-the-art matching/retrieval performance. The proposed system has demonstrated its efficiency and effectiveness toward novel multimedia compression framework for various applications in smart devices.
关键词:computer science and technology;visual search;video local feature descriptor;scale-invariant feature transform;high efficiency video coding
摘要:The assessment of visual experience plays an important role in stereoscopic 3D animation production. In practice, visual experience assessment heavily depends on subjective evaluation by professional cinematographers. In this paper, an assessment model combining visual features and subjective scores is proposed. Visual comfort and stereo effect are the main indicators in visual effect assessment. Based on the two indicators, disparity features of visual comfort and stereo effect of the input animation scenes are extracted. Then,the support vector regression method is used to obtain the mapping function of disparity features to visual comfort and the mapping function of disparity features to stereo effect assessment scores. Validation experiment results demonstrate that the proposed method can automatically predict the assessment of visual comfort and stereo effect through five-level scores. This method can reveal the relationship between visual features and subjective assessment scores, as well as visually and efficiently assist animation cinematographers in stereo adjustment.