摘要:Local image descriptors are widely utilized in many image understanding and computer vision applications, such as image classification, object recognition, image retrieval, robot navigation, and texture classification. The development of the SIFT algorithm highlighted the beginning of modern local image descriptor research. Recently developed modern local image descriptors are surveyed in this study. Four types of local image descriptors, namely, spatial distribution descriptors of local features, spatial correlation descriptors of local features, local descriptors based on machine learning, and extended local descriptors (local color descriptors, local RGB-D descriptors, and local space-time descriptors), are presented. The local image descriptors are then analyzed and categorized. The performance of the local image descriptors is investigatedin terms of invariance, computation complexity, application field, evaluation methods, and evaluation datasets. Finally, this paper concludes with a discussion of directions for future research of the local image descriptors. In recent years, the research of local image descriptors has made great progress. Many excellent descriptors are proposed, and their performances have greatly improved in the distinctiveness, robustness and real-time, and their application fields are continually expanded. Local image descriptors are widely utilized as an important and fundamental research field of computer vision. However, many problems in the use of these descriptors persist. This condition indicates that further research on local image descriptors is required.
摘要:To address the issues of severe missing and false detection rates and application of a fixed dominant color number in state-of-the-art dominant color extraction methods, a histogram peak-filtering and rejection-based algorithm is proposed. The algorithm identifies a small number of dominant colors, with such characteristics as high spatial aggregation degree, highly similar color pixels, small representative error, and large color difference. First, a robust color histogram of the input image is counted by thresholding the spatial aggregation degree of each pixel to avoid the influence of noise. The peaks of the color histogram within a relatively small color range are selected as a candidate dominant color set. Second, the selection color range for the color histogram peaks is progressively increased, and a candidate dominant color filtering process is implemented to reduce similar colors in the candidate dominant color set. When two candidate dominant colors fall in the same selection color range, the one with more similar pixel numbers is preserved. The other one is removed if it contains similar spatial distributions or more common similar pixels; otherwise, it is retained. Finally, the dominant colors are determined through a candidate dominant color rejection process to remove false candidate dominant colors, which are those with a scattered spatial distribution, a few similar pixels in the order of a magnitude, and a small color difference with the others in the candidate dominant color set. In addition, a comprehensive evaluation model is established for the dominant colors of an image. The model can reflect all influencing factors for the dominant colors and avoid the unilateral defects of the traditional evaluation approach. Experimental results prove that the proposed dominant color extraction method is superior as an effective representation of the color features of an image. The average appraisal score is 1.1 times that of the highest score of one of the previous methods; this relatively increases the algorithm's performance by approximately 10%. The proposed algorithm meets the requirement for application in image retrieval, segmentation, editing, etc.
摘要:To achieve an improved edge-preserving effect in the image smoothing process, a novel image smoothing algorithm using the sparse feature of pixel intensity and gradient as dual constraints is proposed. A pixel intensity and gradient function based on L0 norm is set as a constraint term of the smoothing model. Two auxiliary variables are introduced through half-quadratic splitting strategy to construct the final smoothing models. Finally, the alternating minimization algorithm is applied to solve the model, and a closed-form solution of the smoothed image is obtained in Fourier frequency to accelerate the speed of the algorithm. Smoothing experiments on natural images show that the proposed algorithm can better meet the requirements of edge preserving, denoising effects, and real-time applications; the proposed algorithm requires only 3.42 s and is 7.85 s faster than the bilateral filtering algorithm. The experiments demonstrate that the proposed algorithm outperforms other smoothing algorithms. The algorithm can remove unimportant details, retain the image edge features in the image, and achieve the effect of image smoothing. Thus, it is applicable to smoothing, denoising, and boundary enhancement of images with a complex background.
关键词:image smoothing;intensity and gradient;sparsity;alternating minimization
摘要:The fundamental issue of multi-modality information cross-retrieval is feature representation of multi-modality data. Sparse coding is an effective representation method for feature modeling. However, when the query terms and the retrieval terms come from different modalities, the traditional sparse coding may never be suitable because the distribution difference between different modalities and similar features can be encoded as a significant difference of sparse representation. Therefore, in this paper, we present a multi-modality information cross-retrieval algorithm based on sparse coding. In the proposed method,maximum mean difference (MMD) and graph Laplacianare used to formulate the sparse coding objective function to thoroughly exploit the multimodal information in coding. Then, feature-sign search and discrete line search algorithm are used to optimize the objective function. We performed a cross-retrieval experiment on a Wikipedia text-image dataset and compared the proposed method with traditional sparse coding methods. The experimental result shows that the proposed method increased the average mean average precision (MAP) of cross-retrieval by 18.7%. The proposed algorithm improves the robustness of sparse coding and the accuracy of multimodal cross-retrieval. and more suitable for extracting features of multimodality data for further operations, such as cross-retrieval, classification, etc.
关键词:multi-modality;cross-retrieval;sparse coding;maximum mean discrepancy;graph Laplace
摘要:An image sequence moving straight-line tracking method based on the forecasting mechanism of a point and straight-line optical flow is proposed for the problem of moving straight-line feature tracking in image sequences that contain a complex scene. The basic constraint equation of the point and straight-line optical flow is defined; the equation is derived from the image straight-line expression. The three significant corollaries about the corresponding relationship between the point and straight-line optical flow are deduced based on the basic constraint equation of the point and straight-line optical flow. Then, with the corresponding relationship between the point and straight-line optical flow, straight-line optical flow is estimated with the defined computing model through the mean filtering strategy. With the estimated straight-line optical flow, the moving straight line can be extracted by the given straight-line optical flow threshold. Finally, with the extracted moving straight line and the estimated straight-line optical flow, the prediction coordinate of the moving straight-line in the next frame is obtained. The moving straight-line tracking result is acquired by comparing and selecting the minimum Euclidean distances between the moving straight-line prediction coordinate and the actual coordinates of the next frame in the neighborhood window of the Hough space. To demonstrate the performance of the proposed method in moving straight-line tracking, three elaborate synthetic and real image sequence experiments are conducted. In the experiments, the effects of optical flow estimation, straight-line optical flow threshold, and size of the straight-line matching window are demonstrated. Experimental results show thatthe proposed method can accurately extract and track the moving straight-line; it can also track and match the moving straightline stably. No wrong matching is noted for the interference straight lines of the straight-line tracking result, and the time consumption of the straight-line tracking method is not more than 12 s. Compared with traditional moving straight-line tracking and matching methods, the proposed method has higher straight-line tracking accuracy and better robustness; this condition indicates that the proposed method is more applicable to the problem of moving straight-line tracking in complex scenes.
摘要:A tracking-by-detection multi-target algorithm is proposed based on feature fusion and discriminative appearance models. To identify the responses of the detector in each sliding window, a dual-threshold strategy is adopted to perform low-level association and generate reliable tracklets between adjacent frames. Training samples are collected from these tracklets. Then, we merge several features to robustly describe the training samples and use the Adaboost algorithm to train the classifier, i.e., discriminative appearance model, online. Finally, the discriminative appearance model is used to link the tracklets into longer ones to form the final complete target trajectories by an iterative process. Experimental results on four challenging databases(TUD-Stadtmitte, TUD-Campus, TUD-Crossing, and Town-Center) show that the proposed method can efficiently deal with occlusions, target deformation, and background interference. The tracking results on the TUD-Crossing database are quantitatively analyzed, and the performance metrics of our algorithm are as follows: the FAF is 0.21, the MT is 84.6%, the ML is 7.7%, the Frag is 9, and the IDS is 4. The proposed method outperforms several state-of-the-art approaches in terms of FAF and Frag. The multi-feature fusion is appropriate for target expression, and the discriminative appearance model is effective for tracklets association. The proposed algorithm exhibits satisfactory performance in a complex scene and can be further applied to the preprocessing of some advanced algorithms, such as trajectory retrieval in behavior recognition.
摘要:Visual object tracking is aprocess that continuously infers the state of a target from several unconstrained scenes. It is commonly formulated as a searching (or classification) problem that aims to identify the candidate that matches the target template the most as the tracking result. The target template is maintained over time and updated online once the tracking result is available. Prior to tracking at the current time, a set of candidates are sampled around the state of the target at the previoustime. Both the target template and candidates are represented by an appearance model. Then, a target searching strategy is employedto find the candidate that matches the template most as the tracking result. Although several excellent methods of visual tracking exist, this area remains an overwhelming research topic because of several unresolved challenging issues that arise from both template learning and appearance modeling. From the point of view of appearance modeling, exploiting several representative templates from online data is the core problem and plays a key role in complex scenes where the target state is being changed over time significantly. In the proposed tracking framework, several low-dimensional basis vectors called positive templates are learned from high-dimensional online data by using the online PCA algorithm. Several negative templates are then sampled according to the last tracking result. The most representative object templates are organized by combining both positive and negative templates, and the target candidate is well represented through the use of online learned integrative templates with some additive Gaussian-Laplacian noise. Finally, the maximum likelihood between the target candidate and real object is estimated. Thus, the tracker can capture accurate information on the real object in each frame.Reasonable arrangements of the template update strategy are used to enhance the object templates during tracking. The online integrative templates can exploit the most comprehensive information on the target object with positive and negative templates compared with the simplex positive template learning approach because the online negative template expansion operation generates strong magnetic anisotropy between the target candidate and background data. In other words, positive templates help the tracker find the most possible target while negative templates actively represent the background data to help the tracker avoid the drifting problem. Thus, the tracker maintains the good capability to identify the greatest possible target candidate easily. Extensive experiments are conducted to validate the new algorithm. The tracker can learn several comparative object templates and self-updates at a fixed period, adapt well to several variations caused by intrinsic or extrinsic factors (pose, illumination, occlusion, scaling, background cluttering,motion blur, etc.), and maintain the capability to exhibit favorable performance. Although template learning with online PCA is a widely-used feature extraction method for computer vision problems(e.g., visual object tracking) and its learned templates contain some representative information on the target object, it is not very representative and needs to be enhanced with some additional information on the object to adapt well to uncertain complex variations. In this paper, two core issues (online template learning and appearance modeling)in visual object tracking are studied. Detailed descriptions of an efficient template organization strategy and an accurate model representation technique are provided, and a novel visual object tracking framework is proposed. The proposed algorithm can automatically exploit several useful integrative templates of the object from online data and self-updates. Hence, model representation exhibits strong robustness and improved tracking accuracy. Experiments on many challenging image sequences demonstrate that the proposed method achieves the same and even better results when compared with several state-of-the-art tracking algorithms.
摘要:An image can be easily tampered given the development of digital imaging technologies nowadays. Copy-move forgery is one of the most commonly and easily used tampering techniques. To make the tampered image look normal, the copied region may be subjected to various post-processing operations. However, most existing methods for detecting altered regions are too sensitive to the post-processing operations. As most existing copy-move forgery detection algorithms are weak, a detection algorithm based on exponential Fourier moments is proposed in this paper. First, a grayscale image is divided into multiple overlapping blocks. Then, the exponential Fourier moments of every block will be regard as a feature vector, and all vectors are sorted by lexicographic sorting. The questionable blocks are selected based on vector similarity and block displacement. Finally, the error similar blocks are removed by the neighbors' number and the angles' variance to locate the final tampered region. If an RGB image is detected, then each color channel can be independently processed to obtain three results, and the final result is obtained by performing an “and” operation. Most existing copy-move forgery detection methods usually convert an RGB image into a grayscale image, thereby leading to information loss. As a result, we detect each color channel to have three independent outcomes and integrate the different results. To make the method more robust, we use the consistency of the pasted region to remove error similar blocks. Experimental results show that the method is robust against post-processing operations, such as rotation, noise addition, and Gaussian blur. Compared with the method that uses radial harmonic Fourier moments, our proposed method has a better efficiency when the noise is added to the image. The detection rate increased by about 26.66%, and the error rate decreased by about 33.77%. A user can easily creative a convincing image by copying and pasting content within the same image. The proposed method can detect and locate the duplicated regions. Moreover, the method is still effective even when an image is distorted by rotation, additive noise, or Gaussian blurring.
摘要:A two-stage identification method based on sketch entity identification and the perceptual hashing technique is proposed to overcome the defect of strong randomness and excessive freedom in handwritten sketches and balance the overall properties and local characteristics of a sketch. First, the geometrical characteristics of the stroke, the stroke order, and the stroke structural characteristics of the input handwritten sketch are extracted. Second, a semantic library of sketches containing information on entity, stroke structure, and stroke order is searched to recognize a sketch composed of a regular geometric entity. If no proper sketch is available in the library, an image of asketch is generated and recognition of the sketch image is implemented with perceptual hashing technology. With the sketch recognition method proposed in this paper, recognition of 150 types of sketches in a database was achieved; the average recognition rate is 82.6%. Experimental results show that the proposed method has a high recognition rate for any handwritten sketch of database input by different users. The method also allows for extensive identification of other sketches by adding sketch types to the semantic library of sketches and the database.
摘要:Missing geometry appears during mesh reconstruction and editing; completing the holes is thus important. To complete the holes of a complex surface effectively, mesh completion based on loop-driven spherical coordinates and iterative closest point (ICP) registration with curvature and normal is presented. First, the user searches for a similar mesh patch and places it around the hole of the mesh. Second, the B-spline curve is utilized to fit the hole boundary of the target. The boundary loop of the source mesh is located on the B-spline. Loop-driven spherical coordinates, which map and deform the source mesh patch to match the target mesh, are then constructed. Finally, Laplacian smoothing and the ICP algorithm with curvature and normal are applied to complete the mesh completion. Two mesh parts are merged smoothly. Experimental results reveal the proposed algorithm's ability to retrieve a missing feature effectively and smoothly. The loop-driven spherical coordinates prevent the application of cage meshes to deformation, and the ICP iteration can complete the registration effectively. Compared with earlier approaches, the proposed algorithm can retrieve a missing feature effectively.
关键词:mesh completion;loop-driven spherical coordinates;iterative closest point (ICP) iteration;Laplacian smoothing
摘要:Speech synchronized tongue animation remains lacking in research. Under this background, this paper proposes a physiology-based tongue animation system. First, an accurate physiology-based tongue model is created, the deformation of which can be driven by muscle activations. Second, the model is utilized to produce numerous tongue deformation samples according to numerous designed muscle activations. With these samples, a neural network that can transform muscle activations to tongue deformation is trained. Then, from the 2D tongue deformation results on tongue X-ray data, the corresponding physemes (muscle activation and rigid movement sequences) are estimated with this neural network. Lastly, speech synchronized tongue animation is synthesized by inputting these physemes into the tongue model for simulation. Experiment results demonstrate that the proposed system can produce realistic-sounding voices and visually realistic speech synchronized tongue animation. The system can be used to build a phonemes-physemes database from collected 2D tongue movement data on Mandarin Chinese or other languages and can synthesize highly realistic tongue animation corresponding to the language.
摘要:Following the rapid development of network and real-time rendering technology, 3D demonstration for mobile terminals can now provide remote interactive real-time model rendering. Increased computing complexity and data handling scale affect the quality and real-time performance of 3D demonstration. Hence, a distributed parallel rendering method for mobile terminals is proposed. The method renders a model in both server and client sides. The server side utilizes levels of detail technology to control a scene's complexity and generates primary rendering frame views. The client side employs image-based rendering technology to re-render an image and further improve rendering quality. compute unified device architecture (CUDA) parallel computing is employed to accelerate the process of data rendering. Experiments show that the method improves rendering speed, reduces the size of transmitted data, and further improves image quality, frame rate, and data traffic optimization by approximately 10.8%. Hence, a valuable solution for 3D demonstration for mobile terminals is provided. In mobile networks,the proposed method can reduce server side loading pressure and improve resource utilization and customer experience.
关键词:distributed rendering;levels of detail;image based rendering;compute unified device architecture(CUDA)
摘要:At its present stage, pencil drawing using a computer cannot approximate reality primarily because tone in real penciling has more obvious characteristics in dark, grey, and bright layers' changing than in a greyscale image. To address this problem, a novel pencil drawing automatic generation method is proposed. The new method replaces traditional gradient image method to combine a LAB space color difference image and a texture image as features for outer outline extraction. It also allows thetexture's constant hue to change in the rendering process. An arctangent hue differentiation model is likewise proposed to reflect the hue layered effect.The core idea, based on the dark, grey and bright layers' pixelratio, is to adjust the pixel grey value to achieve layering tones. The new method avoids noise impact and displays detailsbetter. It not only resolves the sudden color change in direct stratification but also achieves integration between brightness stratification and continuous huechanges. The final effect shows that the new algorithm has two advantages over other algorithms. First, the outline is more continuous and clear. Second, the effect of texture can achieveboth consistency and layering, which cannot be achieved by other algorithms. The proposedalgorithm can be applied to either colorful or grey images of various resolutions; the higher the resolution is, the better the effect is.
关键词:linear integral convolution(LIC);LAB space color differentiation image;texture contrast;arctangent hue differentiation mode
摘要:Effectiveness and being realistic are the essential problems in crowd path planning in crowd simulations. Existing path planning algorithms have limits when applied in large-scale simulation, which ignores the diverse path preferences caused by psychological factors. In this paper, we propose a real-time emotion-integrated path planning algorithm(EPP). Based on personality theory, we build an emotion model for crowds and set the diverse path preference for different emotions. For path modeling, we constructed a global directed navigation graph with single-step global search to identify the available global path. For path search, the objective function with the least expected time principle is presented. With this objective function, real-time local search is employed to determine the optimal or suboptimal solution. Experiments show that the proposed approach can effectively simulate path planning with a large-scale crowd in different scenes. Compared with previous algorithms, EPP is more effective and efficient. The robustness of the proposed approach is further validated by discussing the differences in crowd path planning at different emotional states. A compatibility experiment is also conducted by integrating the proposed algorithm into different crowd movement models. The proposed approach is highly effective and efficient and can be adopted for applications with large-scale crowds and diverse scenarios.
摘要:Land cover classification can provide important information for ecosystems,water resource, and climate models. Remote sensing technology has many advantages in land cover classification because of its continuous coverage at the spatial scale and continuous observation at the time scale. The Landsat-5 TM and the Landsat-7 ETM+ sensors, which are important remote sensing data sources for regional land cover classification applications,have failed successively. Landsat-8 continues the mission of earth observation of the Landsat series. The OLI sensor of Landsat-8 has several new characteristics,which include adding a deep blue band and cirrus band, narrowing the spectral range of the near-infrared band, and increasing the radiation resolution and the signal-to-noise ratio. This study investigates the method of land cover classification in Beijing using Landsat-8 OLI data and discusses the feasibility of the method. First, the land cover classification system that is suitable for the study area and the spatial resolution of OLI sensor are determined, and the data of Landsat-8 multispectral images that cover the wholearea of Beijing are subjected to preprocessing,including atmospheric correction (using 6S model), topographic correction (using C model), image mosaicking, and extraction. Then, the texture images (at four different scales) of panchromatic band are extracted using gray-level co-occurrence matrix,and the texture images are resampled to obtain texture features.To improve classification accuracy, the texture features are fused with the multispectral data, and land cover is classifiedby using a support vector machine. Finally, precision evaluation is performed by using a confusion matrix,and the overall accuracy and Kappa coefficient of the method is determined by using classified images that use spectral features only and classified images that use spectral features and texture features. The results of the study are as follows: (1)With regard to the preprocessing methods of Landsat-8 OLI data, atmospheric correction using 6S model and topographic correction using C model can improve class separability between different land cover types in varying degrees. (2) In terms of the use of texture features in land cover classification of Landsat-8 data, the addition of texture information of panchromatic band in Landsat-8 can effectively improve the accuracy of classification of some land covers(such as forest, crop, building, and bare land);the overall classification accuracy is improved by 2.8%, and the kappa coefficient is improved by 0.0336. (3) In terms of extracting the texture features of Landsat-8 panchromatic band, 5×5 window is the most suitable scale,compared with 3×3, 7×7, and 9×9 windows, in land cover classification of Landsat-8 data. Compared with Landsat TM/ETM+ data, the new characteristics of the Landsat OLI data help promote the use of Landsat-8 data in remote sensing land cover classification. The proposed method is suitable for research and application of land cover classification using Landsat-8 OLI data and can satisfy the requirements for land cover classification in large regions.