摘要:A fast wavelet transform with high compression ratio results in a serious block phenomenon in self-organization feature mapping (SOFM) algorithmand a poor image restoration quality. To address the above problem,RSOFM-C vector quantization algorithm is proposed, in which the neural network relay neurons are introduced. The use of relay neurons addresses the problem of uneven code words by introducing the concept of relay neurons. Euclidean distance discriminant inequality is given in neural network middle layer. Neurons that failed to satisfy the distortion measure are excluded, thus reducing repeated calculation and accelerating the learning speed.SOFM-C algorithm and fast wavelet transform are combined according to the difference signal coding principle in DPCM. The low frequency image signal is further compressed by using the RSOFM-C algorithm. In the simulation experiment,the proposed algorithm is compared with similar compression method. At 52% compression ratio, the peak signal-to-noise ratio of this method reached 39.28 dB, which is higher than that of other methods. The compression algorithm can eliminate the blocking phenomenon, and a high quality reconstructed image can be obtained while ensuring high compression ratio. Experiment shows that by introducing the fast wavelet compression method of interneuron, images can be compressed with high compression ratio, fidelity, and speed.
摘要:In the traditional de-noising model, considering only image de-noising and edge protection will lead to loss of detailed information. To address these shortcomings of the traditional models, we present an image de-noising model based on matching normal distribution. The proposed model is based on the classical anisotropic diffusion model. The effect of the diffusion coefficient in the diffusion process is first analyzed, and the flux function processed by normalization is introduced into the establishment of a new diffusion coefficient. The novel diffusion model is then built. The newly established model deals with both de-noising performance and the protection of the edge and texture of the image. Thus, another model is proposed to build the diffusion coefficient into a normal distribution function. Simulation results indicate that the peak signal-to-noise ratio is improved by 28 dB, the mean square error decreases sharply, the image edge is clearer, and the contrast is enhanced sharply. The novel proposed model can handle the diffusion process and maintain good de-noising performance and edge protection. The detailed information of the texture is satisfactory, and the peak signal-to-noise ratio is improved drastically. Therefore, the performance of the proposed model is better than that of the classical model.
关键词:matching normal distribution;flux function;diffusion coefficient;image smoothing
摘要:Traditional fractal image coding is highly time consuming. However, according to the sub-block feature, the approach is very effective in accelerating encoding speed. This study proposes a new sub-block feature function to accelerate coding speed and improve coding performance. This study proves a theorem that unifies theoretical approaches from other documents. This theorem describes the relationship among trace length, trace distribution, and encoding performance. The advantages of multiple documents are compared, and a new block feature is defined on the basis of the theorem. Experimental results show that the proposed algorithm has improved the main diagonal sum feature. Moreover, the local cross trace feature in coding performance has a short encoding time under the same PSNR and has high PSNR under the same encoding time. In the same search radius, the algorithm could determine a large number of optimal matching blocks. That the R-block is a neighbor of the D-block according to block feature is a necessary but insufficient condition to obtain the best matching error. When 10% of the codebook capacity is searched, only approximately 25% of all R-blocks can find their best matching blocks. When half of the codebook capacity is searched, the number of such R-blocks can reach approximately 80%. The proposed algorithm has better coding performance and decoding image quality than the main diagonal sum and local cross trace features.
摘要:The performance of the Bag-of-Words model in the field of image classification is limited mainly by the quantization error of the local feature. To reduce the quantization error of the local feature effectively, an image classification method based on global coding combined with multi-scale codebook is proposed. A global coding is implemented by utilizing fully the manifold structure of the image features and by computing the global information of the codebook. The coding coefficients obtained by the method are relatively smooth and accurate. Furthermore, a multi-path method is designed to integrate all feature representations to describe the image. To a certain extent, this method can achieve the scale invariance of feature representations. Several experiments are conducted on two commonly used benchmark data sets,namely, UIUC-8 and Catltech-101, and the average classification accuracy rates reach up to 88.0% and 83.2%, respectively. Experimental results show that the proposed method improves the performance significantly compared with the fixed-scale locality-constrained coding methods.
摘要:K-D tree and Shell are commonly utilized to accelerate isosurface ray tracing. Shell-based isovalue ray tracing method is more efficient than K-D treeis if the isovalue is fixed. Otherwise, the K-D tree-based isovalue ray tracing method is more efficient because the shell-based isovalue ray tracing method needs to reconstruct the shell. To utilize the advantages of both methods, this paper presents a fast isovalue ray tracing method by combining K-D tree and Shell. The main point of the proposed method is how to switch between the two methods smoothly. The K-D tree-based isovalue ray-tracing method is first improved to allow a progressive construction of the shell during rendering. The improved K-D tree-based isovalue ray tracing method is used when the isovalue changes, and a new shell is progressively constructed to allow the switch to a faster shell-based isovalue ray-tracing method. The presented method and K-D tree-based isovalue ray tracing method have similar speed when the isovalue changes frequently. The presented method can also achieve a similar speed as that of shell-based isovalue ray tracing method when only the zoom in/zoom out/rotation operation is performed by the user. Result shows that the presented method combines the advantages of both methods. The presented fast isovalue ray tracing method utilizes both K-D tree and Shell and thus can achieve a high rendering speed for fixed isovalue scene and dynamic isovalue scene.
关键词:volume visualization;isosurface;ray tracing;K-D tree;Shell data structure
摘要:Previous grid-based fluid simulation methods are inefficient and are thus non-conducive to the interaction between objects and fluids. The multiscale adaptive moving grid method is proposed in this study. The proposed method is used in cloud scene simulation and in interaction simulation between objects and clouds. The resolutions of global and local grids can be adaptively selected, and the oriented bounding box hierarchical trees of objects are established to enhance simulation efficiency. Given the small kinematic viscosity of clouds, the controlling and thermal dynamics equations are simplified. Partial differential equations are discretized according to the proposed method. Moreover, upwind difference is applied to ensure the stability of the simulation process. The method is designed to take advantage of GPU computing performance to improve simulation speed. Cloud scene simulation is performed in the tests. Results show that the proposed method can render cloud scenes in different periods realistically and thus meets the requirements of large-scale cloud scene simulation. Simulation of cloud scenes and of the interaction between rigid objects and clouds can be realized in real time. Compared with the previous grid-based methods, the proposed method can be conveniently realized in a GPU, thus enhancing simulation efficiency and the realism of rendered images.
关键词:real-time rendering;multi-scale moving grids;clouds scene simulation;equations simplification;upwind difference
摘要:Query-by-example spoken term detection for low-resource languages has recently drawn considerable research interest. For low-resource languages that lack sufficient annotated data and related expert knowledge, spoken term detection techniques based on traditional large vocabulary speech recognition cannot be directly used. Researchers have recently attempted to determine an unsupervised technique to perform this task for low-resource languages. In this study, we first present the challenges confronting this task. We then introduce the algorithm framework based on dynamic time warping (DTW) commonly used in this task. We finally present the recent research devoted to feature representation, template matching, speed-up, and other related topics. Although the research of this technique on low-resource language has got much progress, there are not real-life applications. Some unified feature representation and indexing method must be proposed to attain both good effectiveness and efficiency. We present the commonly used performance evaluation standards. The conclusion of our investigation is presented, and possible future research directions are discussed.
关键词:spoken term detection;low-resource;dynamic time warping
摘要:A single-captured image of a real-world scene is frequently insufficient to reveal all details. To address this problem, images of the same scene captured by multiple sensors or by the same sensor but with different properties are typically combined into a single image by using image fusion techniques. A novel technique based on fast discrete curvelet transform (FDCT) for improving image fusion quality is presented in this study. Source images are initially decomposed via FDCT. A new fusion rule is subsequently proposed to fuse low-frequency and high-frequency coefficients; this rule is unlike those in previous image fusion methods. Low-frequency coefficients are fused by local energy, whereas high-frequency coefficients are fused by sum-modified-Laplacian. The most important feature information can be selected as the fused coefficients by applying the fusion rule. Finally, inverse FDCT is applied to reconstruct the resultant image using the fused coefficients. Several images, including multimodal medical, infrared-visible, and multifocus images, are used in the experiments. Experimental results demonstrate that the proposed technique is better than traditional methods, such as pixel averaging, wavelet transform, and other state-of-the-art methods,including FDCT and the method presented based on bilateral gradient, in terms of both subjective and objective evaluations. The proposed fusion algorithm can obtain the most important feature information and exhibits superior performance to other methods in terms of multimodal and multifocus image fusion.
摘要:The latest video coding standard high efficiency video coding (HEVC) adopts a more flexible structure and new coding tools compared with other coding standards. The adoption of new technologies results in entirely unrelated prediction residual blocks. An inferior energy concentration can also be obtained by using several methods,such as DCT transform in frequency domain,in less-correlative pixel residual blocks. Among these new coding tools, transform skip mode can enhance the coding efficiency effectively while increasing the coding complexity considerably. Therefore, the difficulty of practical real-time coding is increased. To address the problem, a fast algorithm that reduces the transform skip mode in advance is proposed. The square root of the rate distortion cost is selected as the threshold value by analyzing whether the distribution of the residual block has selected the transform skip modeas the best mode at different rates of distortion costs.To save bits, the HEVC standard defines whether the coefficients of transform units are all zero after transform skip and quantization.The DCT/DST transform is directly chosen as the best mode. A larger quantization parameter can mean a larger percentage of all zero blocks after transform skip mode.Therefore, an exponential model of the rate distortion cost and quantization parameter is established. In the actual coding process, thresholds can be calculated in advance based on the quantization parameters.These thresholds are used to reduce the number of transform units that need to check the transform skip mode. Therefore, the complexity of the transform skip mode is reduced. Only a small number of blocks need to check the transform skip mode, and the thresholds are obtained by offline training.Thus, no additional complexity is added, and the computational complexity of the encoder is reduced. Experimental results show that compared with the standard encoder,the fast algorithm has minimal effect on PSNR and bit rate for standard test sequences that include different scenes. On average, about 70% of the transform units do not need to check the transform skip mode. The exponential model established in this paper fits very well, with a square of determination coefficient larger than 0.95. The algorithm selects the square root of the rate distortion cost as the threshold to reduce the transform skip mode and obtains the threshold by using the exponential model and quantization parameter.The transform skip mode can be reduced in advance. Experimental results indicate that the fast algorithm can reduce the coding complexity effectively because of the addition of transform skip mode with negligible performance loss. The proposed algorithm can be applied to real-time situations. It can reduce the time of transform skip mode significantly and can be further optimized. The relationship between the coding efficiency and coding time can be balanced dynamically by establishing the model of performance loss and pruning percentage in further research.
关键词:high efficiency video coding (HEVC);video coding;transform skip;rate distortion cost
摘要:Supervised machine learning methods have been applied to multi-label image classification problems with tremendous success. Despite the different learning mechanisms, the performances of these methods heavily rely on the number of labeled training examples. However, the acquisition of labeled examples requires significant efforts from annotators, which hinders the application of supervised learning methods to large-scale problems. In this paper, we extend an active querying method driven by informativeness and representativeness in single-label learning to multi-label image classification. Given that the training set grows in active learning, the classifier needs to be updated accordingly. A new classifier is required to use all training samples for re-training.Under such condition, the training burden of the classifier increases significantly. A highlyeffective online learning algorithm is explored to speed up learning efficiency. To deal with the massive multi-label classification problems, a novel algorithm named active learning with informativeness and representativeness for online multi-label learning (AIR-OML), which aims at sample selection strategy and atupdate issuesin classifications, is presented. The sample selection strategy in active learning is based on the min-max theory, querying the most informative and the most representative examples to retrain the multi-label learner.Kullback-Leibler divergence and Karush-Kuhn-Tucker conditions are utilized to update the multi-label classifier online in real time. Combining active learning with online learning, we propose the AIR-OML algorithm. Experiments are conducted in four different real-world datasets with four evaluation criteria to evaluate the presented algorithm. Experimental results demonstrate that the sample selection strategy explored has a significant advantage over the other two existing sample selection strategies, and the classifier can achieve the best performance with fewer new samples by querying the most informative and representative examples. The AIR-OML algorithm can reduce the cost of human annotation and realize the goal of updating the classifier timely when newly labeled examples arrive.
关键词:multi-label classification;active learning;online learning;min-max theory
摘要:Background In the past few years, we have witnessed a great success of social networks and multimedia technologies, leading to the generation of vast amount of Internet videos. To organize these videos and to provide value-added services to users, human activities from videos should be automatically recognized. A number of research studies have focused on this challenging topic. Human action recognition is a significant research topic in computer vision. The recognition of human actions from unconstrained videos is difficult because of complex background and camera motion. A robust and salient trajectory-based approach is proposed to address such problem. Dense optical flow is utilized to track the scale invariant feature transform keypoints at multiple spatial scales.The histogram of oriented gradient,histogram of optical flow, and motion boundary histogram are employed to depict the trajectory efficiently. To eliminate the influence of camera motions, a camera motion estimation approach based on adaptive background segmentation is utilized to improve the robustness of trajectory. The Fisher vector model is utilized to compute one Fisher vector over a complete video for each descriptor separately, and the linear support vector machine is employed for classification. The average improvement of salient trajectory algorithm over dense trajectory algorithm is 1% on four challenging datasets. After utilizing the camera motion elimination approach, the average experimental result over salient trajectory is improved by 2%. The state-of-the-art results on four datasets ( Hollywood2, YouTube, Olympic Sports and UCF50), the proposed algorithm obtains 65.8%, 91.6%, 93.6%, 92.1%, and the state-of-the-art results have been improved by 1.5%, 2.6%, 2.5%, 0.9% respectively. Experimental results on four challenging datasets demonstrate that the proposed algorithm can effectively recognize human actions from unconstrained videos in a more computationally efficient manner compared with a number of state-of-the-art approaches.
摘要:In recent years, a number of surveillance cameras have been placed. The number of surveillance videos that need to be observed and analyzed is rising rapidly because of increased traffic road monitoring and indoor surveillance needs. Single cameras are insufficient to meet the need of certain monitoring tasks because of the significant increase in the number of outdoor cameras. The multi-camera-based intelligent analysis of human behavior is increasingly valuable in the field of video surveillance, and multi-camera-based pedestrian re-identification is a significant research direction. This paper proposes a new method of pedestrian re-identification based on feature transformation and dataset classification. In this method, a new pedestrian re-identification algorithm is proposed based on the transformation matrix that can massively eliminate the differences of the features. This algorithm can be used in high-dimensional space,in which a vector approaches another vector through the transformation matrix.Therefore, this algorithm is capable of eliminating the feature differences of the same pedestrian. This paper also proposes a new algorithm based on the pedestrian feature classification of datasets; the pedestrian features of each class have similar properties.Thus, these features can share the same transformation matrix, and the algorithm can eliminate the feature differences of a pedestrian under multi-camera. In detail, blocks are built by clustering based on block features. The transformation matrix in each block is trained based on the multi-channel features under different cameras. Each block has a corresponding transformation matrix.The correct block for each testing pedestrian can be determined by comparing that block to that of the training pedestrians. Thus,the transformation matrix of the chosen block is the appropriate matrix. The chosen transformation matrix then eliminates the differences in the features of the testing pedestrian under multi-camera. Experimental results prove that the proposed method can improve the accuracy of finding the same pedestrian under multi-camera. Specifically, the Rank1 matching rate of the method in VIPeR dataset (50% training, 50% testing) is 22%, which is better than the results of other existing methods. The feature transformation module and the dataset classification module can match the same pedestrian under multi-camera. The matching rate of pedestrian can be reduced significantly if one of the two modules is eliminated. Moreover, the experiments are designed to verify the robustness characteristics of the chosen feature. The experimental results based on real street scene dataset show that the chosen feature is more out standing and robust than some of the existing features. This paper proposes a new method of pedestrian re-identification based on feature transformation and dataset classification. An appropriate transformation matrix can eliminate the influences of illumination and pedestrian poses under multi-camera. Thus, the method can make the features of the same pedestrian close to each other. The method can also significantly improve the accuracy of matching pedestrians under multi-camera.
摘要:Fat finger, target occlusion, and physical fatigue often reduce the precision of touch input, and these factors might degrade mobile handheld touch conditions. This study aims to identify strategies by which to improve the accessibility of small target selection and touch input precision. Moreover, this work empirically investigates and compares the performances of these strategies. The tilt and movement acceleration detection function of mobile touch devices is used to explore the performance, characteristics, and suitable usage of four kinds of target selection techniques, namely, direct touch, shift & zoom, tilt, and attraction. Direct touch is a baseline technique used only for comparison. The other three are techniques proposed to improve target selection. The subjects used the longest time to select the target and to produce brief finger movement displacement. However, few selection errors were committed with the tilt technique. Shift & zoom was ranked as the most favored technique according to user evaluations. Attraction technique exhibited least selection errors and shortest selection time. The subjects committed the highest number of selection errors with direct touch technique. The targets located in the northwest or southwest corners of the screen are likely to be successfully selected. The following are the average selection time, error rate, and subjective evaluation of the four target selection techniques: 86.06 ms, 62.28%, and 1.95; 1 327.99 ms, 6.93%, and 3.87; 1 666.11 ms, 7.63%, and 3.46; and 1 260.34 ms, 6.38%, and 3.74. The three improved target selection techniques outperform direct touch. Each method presents individual characteristics and discrepancies. This study presents guidelines for touch interaction design on mobile devices in target selection.
摘要:Virtual manufacturing environments require complex and accurate three-dimensional (3D) human-computer interaction (HCI). The main problem of current virtual environments (VEs) is the heavy user burden associated with the cognitive and motor operation aspects, as well as the improvement in HCI efficiency. This problem is solved by promoting the cognitive capability of the machine. This study investigates how user intents are analyzes and abstracted, as well as constructs multimodal intent understanding algorithms. Intent-based VE construction is practiced in a virtual assembly system. Experiments on typical intents are conducted and analyzed. A comprehensive evaluation of the usability and reliability of multimodal intent understanding is presented, and the intent-based VE system is demonstrated to be a real-time system. The experiment focuses on the intent of object picking in VE. When the distance between the 3D cursor and object is 5 000 mm, the operation costs 5.344 7 s on average in traditional systems, whereas it costs 2.326 6 s on average in intent-based systems. The intent-based system significantly reduces operation time and manipulation complexity. Intent-driven scenario transition can significantly enhances the naturalness and efficiency of HCI, as well as effectively reduce the complexity of human-centered VE system analysis and development. Application of intent understanding demonstrates that multimodal intent models and algorithms can efficiently promote the naturalness and efficiency of HCI. This system construction method can be used in any VE system.
关键词:intent;human-computer interaction;virtual environment;eye movement tracking;multimodal
摘要:This paper aims to measure the grasp force of human hand static-grasping objects and to evaluate the authenticity of the virtual fingers grasping haptic algorithm. A human hand grasp force measurement platform is built, and experimental contents are designed, which include three static-grasping postures to grasp a sphere object. Five testers are used in this experiment. The measured data are analyzed and compared with the calculated values of the virtual fingers grasping haptic algorithm, with the virtual hand grasp objects under the same static-grasping postures as the testers. The difference between the measured value and theoretical value of a single finger force or fingers result force lying in 1% 6%. Comparison results show that the measured data are close to the calculated values and coincide with human daily grasping experiences. This paper obtains a physical evaluation method to validate the authenticity of the virtual fingers grasping haptic algorithm. This algorithm can generate realistic grasping force and can be applied in natural virtual hand interaction with force feedback.
摘要:Synchronous haptic-visual-audio multi-sensory feedback should be rendered to realize an interactive simulation of music instrument playing. For the haptic feedback, a 6-DoF haptic rendering of multi-region contacts between a rigid hand and the surface of the instrument is a fundamental requirement. In this study, we proposed a hybrid model to simulate the various parts of the instrument through collision response, i.e., analytic model for the thin-sized components (i.e., a line segment) and a sphere-tree for the other parts. We proposed a discrete collision detection method to detect the collision pairs between the sphere-tree of the virtual hand and the analytic model of the strings. A model based on constraint optimization is solved by using an active set method to obtain the non-penetrated configuration of the held hand. To simulate string deformation, a cylindrical volume with variable diameters is defined with the response to the real-time interaction force applied by the hand. Different force feelings can be rendered to reflect hand collision with a static or vibrating string. Experimental results based on Phantom Premium 3.0 illustrated that the proposed method could simulate single-and multi-region contacts between the virtual hand and the strings. Six-dimensional force and torque can be stably simulated. Users can sense the subtle feeling of coming in contact with a vibrating string. This study performs several multi-region contact instances with the use of a discrete collision detection method. No penetration occurs in these instances, and the stacked frequency is above 1 kHz in the loop circuit of collision detection, constraint optimization, and deformation simulation. The penalty-method may cause a distinct torque jump because of the transformation of contact position and normal, thus causing a discrepancy between force feeling and visual feeling. The proposed constraint optimization-based method can improve computational precision.
关键词:haptic rendering;constraints-optimization;hybrid model;music instrument playing