摘要:The concept of compressed sensing has become a crucial topic in the information research field since its introduction.In theory of compressed sensing,the sampling and compression of information are conducted simultaneously,thereby reducing storage space and computation amount.The sampling rate exceeds the limitation of the sampling rate of Shannon's theorem,which is greater than or equal to twice the bandwidth.Consequently,the sampling rate is reduced.Therefore,we can reconstruct the signal rapidly through minimal sampling.The block compressed sensing reconstruction algorithm uses the same sampling rate for each image block,which results in an unreasonable allocation of resources,and the reconstruction process will produce the block effect.The multi-scale block compressed sensing reconstruction algorithm also exhibits the following problems:1) The algorithm reduces block size to decrease storage space,which increases computation amount.2) A rough block edge remains when reconstructing a complex image.The edge and direction-based block compression sensing algorithm allocates the total sampling rate to the layers of sub-band by utilizing the characteristics of edge and direction.The adaptive multi-scale block compressed sensing algorithm allocates the sampling rate to each block image by using the texture and directionality of an image.These two algorithms reduce the number of samples to a certain extent,which considerably saves resources and improves reconstruction performance.Among the two,adaptive sampling is more effective for images with a large gap between smooth and complicated regions.The two algorithms reconstruct the low-frequency coefficients after wavelet transformation;thus,the details of the reconstruction effect on a complex image are non-ideal.An improved adaptive multi-scale block compressed sensing algorithm is proposed in this study to solve the aforementioned problems.This improved algorithm is expected to reasonably use the low-and high-frequency information of an image and enhance the quality of image reconstruction when the concerned image has high detail complexity. In the algorithm,three-layer wavelet transform is conducted to obtain a low-frequency signal and nine ways of a high-frequency signal,which are then inverted with wavelet transform and divided into non-overlapping blocks with the same size.After wavelet transform,a significant amount of energy of the original image focuses on the low-frequency signal,which leads to strong resistance and good stability.The low-frequency signal is processed using a 2D adjacent block edge adaptive weighted filter method.The same jump of the edge pixel gray scale among adjacent blocks produces the block effect.Thus,the four adjacent block edge characteristics around the pixel should be used for filtering.The edge pixel value of the adjacent block is weighted based on distance to obtain the updated pixel value.The wavelet transform of an image can be expressed as the superposition of the approximation signal of the low-resolution image and all the details of the signal.These details are obtained by decomposing each stage that belongs to the high-frequency signal.An adaptive texture block-sampling method is adopted for the high-frequency signal,and the smooth projected Landweber algorithm is used to reconstruct the image.The flat image is overlapped after weighted filtering,and higher-resolution reconstructed images can be obtained. Compared with existing algorithms,such as the block compression sensing algorithm,the edge and direction-based block compression sensing algorithm,and the texture-based block compression algorithm,the proposed algorithm exhibits improved performance under different sampling rates.The high-frequency signal that represents image details is fully reconstructed.The reconstructed images obtained from the improved algorithm have higher resolution,particularly the images with considerable details.The reconstructed images also obtain a high peak signal-to-noise-ratio.The 2D adjacent block edge adaptive weighted filter is an effective method for removing the block effect on the reconstructed image,and reconstruction time is reduced by 0.3 s. After conducting three-layer wavelet transform,the high-frequency signal is used as part of the texture,and an adaptive multi-scale block compressed sensing algorithm is adopted to reconstruct the contours and edges of the image.The low-frequency signal is regarded as the flat part,and the neighbor block-relied filtering method is used to reconstruct image details.Reconstruction time is effectively shortened by reasonably utilizing the low-and high-frequency information of an image and reducing the process to detect the flat block.After the experiment,the quality of the reconstructed images is evidently improved,particularly the images with significant details.The block effect on the images is also apparently eliminated,and the reconstructed images have clear edges and texture details.Thus,an adaptive multi-scale block compressed sensing algorithm is mainly applied to images with complex texture,such as face,architectural,and remote sensing images.
摘要:The exemplar-based image inpainting method known as the Criminisi algorithm occasionally exhibits poor inpainting effect because the confidence term easily decreases to zero,which makes the priority invalid and causes disorders in the inpainting sequence.Excessive scope,low efficiency,and nonvisual texture-matching problem also occur when searching matching patches.To solve the aforementioned problems,an image completion algorithm based on redefined priority and image division is proposed in this study. First,the confidence term in the priority is redefined,and the chessboard distance in the exemplar patch is used to replace the original calculation formula.Accordingly,the priority is validated,and the matching error caused by unreasonable inpainting order is reduced.Second,the image is divided into blocks with different sizes according to image texture information,such that the exemplar patches to be inpaintied search only the image block region with similar features. Experimental results show that the newly defined priority can guarantee the completion of the algorithm and improve the visual effect of the inpainted image.The algorithm is fast because only a few ambiguous matching candidates are searched under the guidance of image division.The completion result analysis of our method is compared with the results of the analysis based on other methods.Subjectively,our method can maintain visual connectivity.Objectively,the time consumed is less than those consumed by most of the other methods. The accumulation of errors caused by the disorder of inpainting sequence is avoided by redefining the confidence term in the priority in the Criminisi algorithm.Reduced algorithm time and improved matching accuracy are also achieved by improving the searching range of exemplar patches.The entire image is adaptively divided into different blocks,and search is conducted only in certain blocks that are similar to the destination.The method exhibits good application in object removal in natural images,and its completion effects are satisfactory.
关键词:image completion;confidence term;priority;image division;texture features
摘要:Visual acquisition is a necessary condition in image processing.However,under the low-illumination conditions of nighttime,images obtained using a visual system lose a considerable number of effective features,and thus,always appear to have low contrast and brightness.This scenario will negatively affect the subsequent processing of computer vision applications,such as intelligent surveillance,object detection,and pedestrian tracking.Image enhancement is generally regarded as an effective method for improving visual effects.This method augments valid information by increasing the difference among features and enhancing regions of interest.Existing image enhancement methods include direct and indirect enhancement algorithms.The concept of indirect enhancement algorithms originated from Dong,who found that a low-light image would be similar to a fog image after inversion.Accordingly,low-illumination image enhancement optimization can be extended to fog image restoration.Fog removal algorithms have been introduced to execute an enhancement-like process.However,several enhancement algorithms for low-illumination images are sensitive to noise and easily saturated.This study proposes a low-illumination indirect image enhancement algorithm to improve image quality and avoid the aforementioned imperfections. Retinex theory is based on color constancy.An image is segmented into two parts in the division operation.One part is an entrance map,whereas the other is a reflection map.Subsequently,the original reflection component is obtained by reducing,or even eliminating,the impact of the entrance map on the reflected map.Multi-scale retinex with color restoration (MSRCR) is an optimized solution for the color distortion caused by retinex.This solution is based on multi-scale retinex.The introduction of the color recovery factor is a key point,which adjusts the proportional relationship of each RGB component from the original image.However,the He-based method is introduced.The atmospheric scattering model is a fundamental structure of the He-based method.Fog-relevant statistics,namely,dark-channel prior,is obtained by observing fog images.Atmospheric light is acquired according to the obtained information.Coarse transmissivity is estimated by using a minimum filter,and the soft matting interpolation method is applied for optimization.Similar to the He-based method,our method is based on a fog-degraded model.First,our method inverses a low-illumination image and then it generates a pseudo-fog image.The characteristics of the pseudo-fog image differ from those of a foggy image.The pseudo-fog image is mainly characterized by a large occupied bright area and a high atmospheric light value.The classic dark channel theory on fog images is inapplicable to large occupied bright regions and will lead to imprecise transmission map estimation.Second,our method trains a specific convolutional neural network (CNN) to directly predict optical transmissivity for the input image.Synthetic pseudo-fog patches are selected as input,pseudo-fog features are acquired via the progressive convolutional operations of each layer,and Brelu activation function outputs a corresponding transmission parameter ranging from 0 to 1.Third,our method obtains an atmospheric light map replaced with a local atmospheric light value to address the oversaturation caused by global atmospheric light.Guided filtering is adopted to refine the optical transmissivity map and atmospheric light map.The gray scale of the pseudo-fog image is used as a guide image to optimize the transmission map obtained by CNN and the atmospheric light map.The final enhanced image is a reverse result of a fog-removed image,which is restored based on the atmospheric scattering model. This study designs three sets of controlled experiments,which are performed on classical image enhancement algorithms,namely,retinex,MSRCR,and indirect enhancement algorithms,including the He-based algorithm.The first set of experiment is conducted from the perspective of subjective visual effects.Accordingly,the second set is evaluated from objective experiments under several evaluation indices,including average gradient,information entropy,and peak signal-to-noise ratio (PSNR).The last experiment is proposed to verify the effectiveness of the parameters (i.e.,transmissivity and atmospheric light) obtained using the proposed method.The result of the subjective and objective aspects shows that retinex is sensitive to noise and causes color distortion,MSRCR tends to cause overexposure due to an excessive increase in mean value,the He-based algorithm according to dark channel prior theory is liable to produce saturation because of its inapplicability to pseudo-fog images,and our method effectively improves visual effects.Our method does not only effectively improve the brightness of low-illumination images,but also avoid evident color distortion and overexposure of restored images.The performance on the PSNR index indicates that compared with a sub-optimal method,our algorithm increases by an average of 2.6. The CNN is a widely used feature extraction method.We adopt this method to learn pseudo-fog features via large-scale training of synthetic data,replace the global atmospheric light map with a local one to avoid saturation,and perform guided filtering to refine this map.The results of the qualitative and quantitative experiments show that the proposed algorithm can improve the visual effects and strong adaptability of a scene and can enhance indoor and outdoor low-light images.If combined with CUDA technology,then tour method can be used for the real-time enhancement of videos.
摘要:Greedy algorithms for compressed sensing have received considerable attention because of their low computational complexity,high speed,and good recovery performance in image reconstruction.As a typical reconstruction method in greedy algorithms,the compressed sampling matching pursuit (CoSaMP) algorithm exhibits good robustness and can guarantee the reconstruction of signals from arbitrary observation noise background.However,the CoSaMP algorithm requires a large number of observations in the atom selection process,which affects reconstruction time.The principle of atom selection can also lead to the inaccurate estimation of a support set to reduce reconstruction precision.Therefore,a stagewise iterative matching pursuit (StIMP) algorithm based on generalized inversion was proposed to improve the reconstruction accuracy and performance of the CoSaMP algorithm. The observation matrix was processed via generalized inversion before iteration to ensure the accuracy and rapidity of atom selection with iteration,which could reduce mutual interference among atoms.The iterative process was divided into two stages.The first stage used the orthogonal matching pursuit (OMP) algorithm to iterate several times given that most of the atoms selected via the OMP algorithm were accurate.The number of iteration times during the first stage should not be too small to improve the accuracy of the support set.The number of iteration times during the first stage was set to /2 in consideration of the reconstruction time.The assumption of signal sparsity was an even number in this study.If is an odd number,then it should take the floor value (/2).The CoSaMP algorithm was used to continue the iteration in the second stage.The initial input of the CoSaMP algorithm was the residual atoms obtained from the OMP iteration in the first stage,which could reduce the dependence of the CoSaMP algorithm on sparsity.The atom selection criteria of the CoSaMP algorithm were changed simultaneously,and 3/2 atoms were selected instead of 2,and /2 atoms were removed each time to ensure that the support set would contain atoms.A threshold was set to control the stopping of the algorithm and to reconstruct sparse signals rapidly and accurately. For 1D Gauss random signals,the signal reconstructed by the proposed StIMP algorithm is close to the original signal,and the error range is (-5e,5e).The changing trend of the success rate with signal sparsity shows that the StIMP algorithm is more robust and has a higher success rate for reconstruction than the OMP,CoSaMP,regularized OMP (ROMP),and Fourier ring (FR)-CoSaMP algorithms.For 2D image signals,the images reconstructed using the OMP and ROMP algorithms exhibit problems in the subjective quality aspect,such as large noise particles,high fuzziness,and poor comfort,whereas the CoSaMP,FR-CoSaMP,and StIMP algorithms can achieve better image reconstruction and have higher resolution,better contrast,and sense of hierarchy.However,the CoSaMP and FR-CoSaMP algorithms are less relaxed than the StIMP algorithm,and the images reconstructed using the StIMP algorithm are smoother and closer to the original image.In the objective quality aspect,the average peak signal-to-noise ratios (PSNRs) of the StIMP algorithm are higher than those of the other reconstruction algorithms at each sampling rate.For example,when the sampling rate is 0.7,the average PSNR of the StIMP algorithm is higher than those of the OMP,CoSaMP,ROMP,and FR-CoSaMP algorithms (2.14 dB,1.20 dB,3.67 dB,and 0.90 dB,respectively).The reconstruction time of the StIMP algorithm is only 7.85 s,whereas those of the OMP,CoSaMP,and FR-CoSaMP algorithms are 11.79 s,32.86 s,and 11.09 s,respectively. An improved reconstruction algorithm,which exhibited high reconstruction speed and exerted considerable effect on 1D Gauss random signal and 2D image signal,was proposed.Compared with the original algorithm,StIMP demonstrates high efficiency and practicability.This algorithm has important practical significance in processing medical and remote sensing images.The StIMP algorithm is also improved based on OMP and CoSaMP;therefore,it should be conducted within the premise of known signal sparsity.Further study should refer to the principle of the sparse adaptive matching pursuit algorithm to improve the StIMP algorithm,such that it can be adaptive in a case with unknown signal sparsity.
摘要:Fingerprint identification is an important and efficient technique used for biometric recognition.Fingerprints have become the most widely used biometric feature in recent years given their uniqueness and immutability.Fingerprint matching is a core research content of automatic fingerprint recognition systems.Matching algorithms directly influence the functions of a recognition system.Most point pattern-matching algorithms depend on the orientation field or directed graph of fingerprint images.That is,the matching of points is transformed into the matching of vectors,which are composed of two feature points.The fingerprint orientation field or the directed graph from the same finger frequently varies at different collection times because the input fingerprint images exhibit translation,rotation,and scale change.Consequently,the calculation of most point pattern-matching algorithms is extremely difficult.Point pattern-matching algorithms are also sensitive to the translation,rotation,and scale change of fingerprint images,particularly rotation.Certain parts of point pattern-matching algorithms cannot deal with fingerprint images with rotation.Therefore,a triangle-matching algorithm that is irrelevant to orientation is proposed and a detailed presentation of composing the congruent triangle is introduced in this study to improve the precision of calculation. A triangle exhibits stability,invariance,and uniqueness.The position structure is stable for any point and a certain triangle on a plane.The proposed triangle-matching algorithm is designed based on this theory.This algorithm efficiently avoids the orientation field or directed graph and significantly reduces calculation.The proposed algorithm,which is independent of orientation field or directed graph,also has preferable stability and robustness performance at different rotation angles.Fingerprint identification can be generally divided into three main periods:preprocessing of fingerprint images,feature extraction,and feature matching.On the basis of this framework,the proposed algorithm mainly contains three periods as follows.First,two benchmark triangles are constituted in identifying a fingerprint and a template fingerprint system.Second,the ordered arrays are composed of the distances from every feature point to three vertices of a benchmark triangle.Third,fingerprint image matching is decided based on the similarity degree of ordered arrays. The overall performance comparison experiments,such as complete fingerprint-matching process,equal error rate,false match rate,false acceptance rate,receiver operating curve,and match time,are completed using the FVC2004 fingerprint database,which is an international standard test library.Experimental results show that compared with other fingerprint-matching algorithms,the proposed algorithm successfully improves accuracy by 27.97% to 33.81%,reduces matching time by 3% to 5%,and decreases the average error in matching by approximately 86.63%.The proposed algorithm also outperforms the compared algorithms in terms of adaptive capacity,accuracy,and robustness for fingerprint images with noise,translation,rotation,and deformation. The proposed algorithm is a global model-matching algorithm,which is unconstrained by the fingerprint orientation field and the locations of fingerprint images.Calculation is significantly reduced compared with other point pattern fingerprint-matching algorithms.The process and implementation of the proposed algorithm are simply based on elementary mathematics.The experimental results indicate that the proposed algorithm demonstrates preferable adaptive performance for fingerprint images with noise,translation,rotation,and deformation.Furthermore,the proposed algorithm exhibits good robustness and can handle different types of images.
关键词:fingerprint recognition;point pattern matching;triangle matching;directionless;benchmark triangle;adaptive and robustness
摘要:Face recognition encounters significant challenges,particularly when images from different persons are similar to one another due to variations in illumination,expression,and occlusion.If we have sufficient training images of each person,which can span the facial variations of that person under testing conditions,then sparse representation-based classification (SRC) can achieve promising results.In many applications,however,the problems of small size of samples and lack of sufficient training images for each person are particularly significant.To solve these problems,this study presents a joint face recognition algorithm between a low-rank class dictionary and sparse error dictionary learning based on theory of sparse representation.The low-rank dictionary of each individual is a class-specific dictionary that captures the discriminative feature of an individual.The sparse error dictionary represents intra-class variations,such as illumination and expression changes. An initial low-rank decomposition dictionary and a sparse dictionary are obtained based on theory of low-rank decomposition.Then,combining theories of low-rank decomposition and structural irrelevance,the discriminative low-rank class dictionary and the sparse error dictionary are trained and subsequently merged as the dictionary that will be applied to the test part.Our method decomposes raw training data into a set of representative bases with corresponding sparse errors to efficiently model face images.We further promote structural incoherence among the bases learned from different classes.These bases are encouraged to be as independent as possible due to the regularization on structural incoherence.Additional discriminating capability is provided to the original low-rank models to improve performance.A sparse coefficient is acquired according to the L1 norm method (homotopy method).Test samples can be classified based on the reconstruction error. Experiments are conducted on the Extended Yale B and AR databases.A random matrix is used to reduce the dimensions of the training samples to shorten execution time.In the Extended Yale B database,which consists of samples with 504 dimensions,32 training samples are selected from each class.The result of the recognition rate that applies the proposed algorithm is 96.9%.Among the samples with 540 dimensions and without occlusion in the AR database,4 training samples are selected from each class.The result of the recognition rate is 83.3%.For the samples with 1 760 dimensions,the recognition rate is 87.6%.Among the samples with 540 dimensions and with occlusion,8 training samples are selected from each class.The recognition rate is 94.1%.The result of experiment on the training samples with 1 760 dimensions and with occlusion is 94.8%.The experiments show that the results of the proposed algorithm are better than the highest recognition rates of the SRC(Sparse representation-based classifier),DKSVD(Discriminative K-SVD),LRSI(Low rank matrix decomposition with structural incoherence) and LRSE+SC(Low rank and sparse error matrix+ sparse coding) algorithms,particularly in cases with insufficient training samples.The experiment in which all the occlusion images are used as the training samples and the unobstructed images are used as the test samples is first conducted to illustrate the importance of sparse error bases,and the recognition rate of the proposed algorithm is 43.6%.then we exchange training samples and test samples and the recognition rate of the proposed algorithm is 83.1%.The sparse coefficient in the test phase cannot remove interference because the sparse error dictionary lacks the composition of wearing scarves and sunglasses.Consequently,the recognition rate is significantly reduced.When occlusion images are used as training samples and unobstructed images are used as test samples,the interference is considerably smaller than the interference when wearing scarves and sunglasses although the sparse errors lack the composition of light and expression changes.Thus,the recognition rate is significantly higher than the former.In practical applications,only sparse error bases are sufficient,particularly in cases where images are disturbed to a high extent,and the recognition effect can be improved. The face recognition algorithm proposed in this study is robust and effective and can achieve an ideal recognition effect,particularly when training samples are insufficient or face recognition is disturbed considerably.The algorithm is suitable for practical applications.
摘要:Visual object tracking is a fundamental problem in computer vision that has numerous applications,such as in intelligent visual surveillance,human-machine interaction,and content-based video coding.In a generic tracking problem,the target can be any object and only its initial location is known.Most state-of-the-art approaches address the tracking problem by learning a discriminative appearance model for the target object.Among the discriminative tracking methods,correlation filter-based approaches have recently demonstrated an excellent performance on benchmark-tracking datasets.Despite the significant developments made in recent decades,visual tracking remains a challenging problem,mainly due to the considerable appearance changes caused by occlusion,deformation,abrupt motion,illumination variation,and background clutter.Features based on convolutional neural networks (CNNs) have recently demonstrated state-of-the-art results on a wide range of visual recognition tasks.Therefore,understanding how to best utilize the rich feature hierarchies in CNNs is significant in robust visual tracking.In view of the problem of a fast-moving,scaling,and rotating target during tracking,a kernel correlation adaptive target-tracking approach based on a convolution feature was proposed in this study. A CNN was introduced to extract high-and low-convolution features.High-and low-convolution response maps were obtained using the kernel correlation filter algorithm proposed in this study,and target position was estimated using the coarse-to-fine method.The target scale was estimated with a 1D scale correlation filter to realize adaptive target tracking,and the kernel correlation filter was updated in real time. We tested the proposed algorithm in an experiment on typical video sequences in public data sets.These data sets involved challenging factors,such as illumination change,partial occlusion,scale change,and complex background.We compared our method with excellent tracking algorithms,such as high-speed tracking with kernelized correlation filters,adaptive color attributes for real-time visual tracking,and real-time compressive tracking.For a quantitative comparison,we used two evaluation metrics,namely,the average center error and the average overlap ratio.The results of the target-tracking experiment showed that the proposed filter algorithm exhibited better performance than the original comparative filter algorithm.The average central position error was reduced by 20%,and the average overlap rate was increased by 12%.The center errors in the video sequences Singer1,Car4,Jogging,Girl,Football,and MotorRolling were 8.71,6.83,3.96,3.91,4.83,and 9.23,respectively.The tracking overlap ratios of the aforementioned video sequences were 0.969,1.00,0.967,0.994,0.967,and 0.512,respectively. In this study,a CNN was introduced to extract high-and low-convolution features.The high-convolution features were used to determine the target and background,whereas the low-convolution features were adopted to predict target position.The coarse-to-fine method was applied to accurately locate target position,which solved the problem of a large tracking error due to the rotation and change in scale of the target.This method improves tracking performance and can update learning in real time.The experimental results indicate that the proposed approach maintains its good robustness and high adaptability even in elaborate scenes,such as those with a changing or fast-moving target scale,occlusion,and various illumination conditions.
关键词:target tracking;convolution feature;correlation filter;Kernel function;discriminant model
摘要:The human visual system can acquire regions of interest for different scenes based on the visual attention mechanism.Each image contains one or more salient objects.Saliency detection involves imitating the visual attention mechanism to obtain important information in an image,thereby improving the efficiency and accuracy of image processing.Saliency detection methods can be used not only in detecting a target object,but also in image annotation and retrieval,object recognition,image clipping,image segmentation,image compression,and other fields.Saliency detection is a research hot spot in computer vision.Although existing significant detection methods have achieved good results,several problems remain,such as the blurring of significant boundaries due to foreground and background noises.Therefore,the accuracy of saliency detection should be improved.Saliency detection methods based on pixels or regions,such as super pixels,can effectively describe the features of salient regions.However,these pixels or regions exist alone and have no real object significance;that is,complete descriptions of objects are lacking.Objectness detection involves obtaining object information by sliding windows.We propose a saliency detection algorithm via object enhancement and sparse reconstruction (OESR) to introduce object descriptions while preserving the effective description of salient features to solve the problems of fuzzy boundaries and improve the accuracy of image saliency detection.The objectness detection method is not used to directly access windows as the final salient objects.We consider window information as an object description to enhance the effectiveness of salient features. The input image is segmented by super pixels,and several super pixel regions are obtained.A central weighted color spatial distribution model is adopted.The model is based on the idea that when a wide range of colors exist,these colors are less likely to belong to a salient region.This model utilizes the color information of an image,but the method is based on pixels.Consequently,the final result lacks structured information.We calculate the color space distribution feature on super pixels to introduce structured information.First,from the foreground point of view,the Gauss mixture model is used to model all colors,and the probability of each pixel corresponding to the C color component is calculated.The probability of each super pixel corresponding to the C color component is calculated through the probability of pixels within a super pixel.The color spatial distribution based on super pixels is calculated through its probability and location information.We use the super pixel color spatial distribution map as the foreground saliency map.Second,from the background point of view,we introduce a sparse reconstruction error based on the idea of contrast to describe the feature difference between a super pixel and its surrounding super pixels.We construct the background template by using the super pixel features of the image boundaries.We conduct pretreatment on the template using the -means clustering algorithm for a combined treatment on the boundary of super pixels to obtain representative boundary features.Super pixels with similar features are merged in each direction;thus,good background templates are obtained.The optimized background template is used as a sparse representation dictionary to compute sparse reconstruction errors.The reconstruction error of a super pixel is corrected in its 8-neighbor to solve the region discontinuity caused by the oversegmentation of an image.After correction,the saliency region becomes smooth,and the sparse reconstruction error is set as the salient value to obtain the background difference map.Finally,from the object point of view,the target enhancement coefficient is calculated using the objectness detection method.We use a fast target detection method to obtain a certain number of proposed windows with various scales.Each window assigns an object score based on the possibility of containing the object.If a pixel belongs to a salient region,then the more target windows that contain this pixel,the higher object scores it will have,and the greater its significance.The target enhancement coefficient is calculated from the object scores of the proposed windows.The foreground saliency map and the background difference map are fused,with the target enhancement coefficient as the guide.We obtain a high-contrast salient map with a significant foreground and a suppressed background. The proposed algorithm is compared with 12 methods on two public data sets (i.e.,MSRA10k and VOC2007).The precision,recall,F-measure,and mean absolute error (MAE) are evaluated on the two data sets.The visual contrast of the experiment indicates that the salient objects detected using OESR are complete and accurate,and the method can also effectively deal with images with complex background.OESR uses the target enhancement coefficient in the salient map fusion step.The final salient region has high brightness,and background suppression works efficiently.The P-R curve,average precision,average recall,and average F-measure value indicate that OESR exhibits certain advantages in the three evaluation indexes.Compared with other methods,our method has improved recall of 4.1% on the MSRA10k data set.In the VOC2007 data set,the recall of our method improves by 18.5%,and F-measure improves by 3.1%.Recall represents the recall situation of images to be detected.An improvement in recall shows that the color features and sparse reconstruction features can describe the salient features effectively,and the introduction of object information ensures the integrity of salient regions.The result of the MAE evaluation index also reflects the advantage of OESR in terms of overall performance. A new saliency detection method is proposed in this study.The method uses color spatial distribution and sparse reconstruction error to produce a saliency map.Object enhancement coefficients are adopted in the combination of the two maps to improve the accuracy of saliency maps.Experimental results show that the algorithm can detect accurate salient regions,which agree with human vision characteristics.This method is suitable for saliency detection,target segmentation,and image annotation based on saliency analysis.
关键词:saliency detection;global color contrast;sparse reconstruction;error propagation;object enhancement
摘要:Object category prior knowledge is typically used to extract image features in popular approaches for the image description task based on the deep convolutional neural network (CNN) and long-short-term memory (LSTM) models.The task of image description involves detecting numerous objects in an image and generating a highly accurate description sentence.In general,people are always sensitive to objects in an image and direct considerable attention toward objects when they describe an image.However,the scene is also important in image description because objects are typically described within a specific scene (e.g.,place,location,or space),or a sentence may lack semantic information,thereby leading to poor-quality description for a candidate image.In current popular approaches,the scene category and information are not regarded seriously,and scene category prior knowledge is always disregarded in an image.Consequently,the generated sentences do not provide a correct description for a scene and position relationships are easy to misjudge,which lead to the poor quality of these sentences.The effects of object category and scene category on generating a description sentence for an image is surveyed and studied to address the aforementioned problem.Both factors are determined to be useful for an accurate and semantic sentence.A novel framework,called fusion of scene and object category prior knowledge,in which scene category prior knowledge and object category prior knowledge are fused,is proposed and designed in this work based on the aforementioned findings to generate an accurate description of an image,improve the quality and semantics of a sentence,and achieve efficient performances. The objects and scene in an image are detected using CNN models,which are optimized on large-scale datasets,and the transfer learning method is mainly utilized for object category and scene category prior knowledge.A deep CNN model for scene recognition (CNN-Scene,abbreviated as CNN-S) is trained on the large-scale scene dataset Place205 to enable the CNN feature to include considerable scene category prior information.The parameters in CNN-S are transferred to CNN-S to extract the CNN feature for capturing scene category and information in a candidate image,and the parameters are continuously optimized using a fine-tuning method.Another deep CNN model (CNN-Object,abbreviated as CNN-O) is optimized on Imagenet,which is a large-scale dataset for object recognition.The parameters are applied to the CNN-O model through transfer learning and continuously trained via fine-tuning to determine object category and information in the image.The CNN features from CNN-S and CNN-O are inputted into the language models based on scene and based on object,respectively.These language models include two stacked LSTM layers and are constructed via factoring,in which the first layer receives the embedding features of words,and the outputs of the first layer and the CNN feature of the candidate image are combined to form multimodal features that will be inputted into the second LSTM layer.A fully connected layer is used as a classification layer to map the feature vector as category information.A Softmax function is used to compute the probability for each word in the vocabulary,which contains all the words that appear in the reference sentences of the image in the training dataset.A late fusion strategy by weighted average is applied to calculate the final score for each word,and the output on the current time step that corresponds to the maximum score is the expected word.All the words generated on all the time steps are formed into the description sentence for the candidate image. Three public popular datasets,namely,MSCOCO,Flickr30k,and Flickr8k,are used to evaluate the effectiveness of the proposed method.The using protocols of the three datasets and other popular approaches are followed.Three evaluation metrics,namely,BLEU for evaluating consistency and precision,METEOR for reflecting the precision and recall of words,and CIDEr for evaluating the semantic of candidate sentences,are adopted to evaluate the proposed model.Results on the three datasets demonstrate that the proposed model significantly improves performances on the three metrics.In particular,the performance on the CIDEr metric is 1.9% and 6.2% higher than that of object-and scene-based models,respectively,on the MSCOCO dataset.Performance is also improved on the Flickr30k dataset by 1.7% and 2.6% compared with the two benchmark models.For the Flickr8k dataset,CIDEr reaches 52.8%,thereby outperforming the two benchmark models by 9% and approximately 11%.On the BLEU and METEOR metrics,the proposed model also achieves better performances than the models with only object and scene as bases.The performances of the proposed model on the three datasets surpass those of most current state-of-the-art methods on multiple metrics. The experimental results show that the proposed model is effective.This model considerably improves performances compared with the benchmark models (object-based and scene-based models) and most popular state-of-the-art approaches.The results also indicate that the proposed model can generate numerous semantic description sentences for an image because the model captures the relationships between objects and scene when the scene category information is applied.However,the performance of the proposed model on the BLEU metric should be improved further.In a future work,additional prior knowledge,such as the action category and relationships among objects,will be fused into the proposed framework to further improve the quality of generation sentences for images,particularly the accuracy of sentences.Other novel techniques,such as residual net for effective CNN feature,region-based CNN for accurate object recognition,and gated recurrent unit for a concise language model,will be used to further improve performance.
关键词:image description;convolutional neural network;long-short term memory;scene category;object category
摘要:The distraction of various factors during the object-tracking process makes tracking states unpredictable.For example,when a tracked object is occluded,ordinary trackers are likely to experience model drift problems.This problem occurs when the target is temporarily influenced by other objects or illumination conditions.The training samples extracted from these frames are not as reliable as those extracted from normal scenes.If these samples are fed into the model-updating process,then the tracking model deviates due to these unreliable samples,and the decision boundary between target samples and background samples becomes blurry.Consequently,the discriminability of the tracking model is degraded,which in turn,causes the model drift problem.If we can tag the training samples with different confidence levels and use the tagged samples to train the tracking model,then the tracker is expected to be tolerant of unexpected scenes. We base our work on the weighted margin support vector machine (WMSVM) classifier and the recently proposed structured support vector machine (SSVM) tracking algorithm.A weighted margin SSVM tracking model (WMSSVM) is proposed,which can consider the confidence values of training samples,thereby enabling the SSVM tracking algorithm to adapt to different scenes.First,sample confidence is estimated according to the overlap rate of the score range of the tracker and the predicted target positions of the current and last frames.This method can reliably reflect the confidence variation status through the entire tracking process.Second,a WMSSVM tracking model is built to train the samples of different confidence levels,such that these samples can have different influences on the decision boundary of the trained tracker.Accordingly,the WMSSVM model can adaptively adjust to different scenes encountered during tracking.Finally,we demonstrate that the tracking model can be solved by using a dual coordinate descent algorithm. We evaluate our tracker on an object tracking benchmark,namely,OTB-100,which contains 100 challenging video sequences.Compared with the base tracker dual linear structured support vector machine (DLSSVM),the proposed WMSSVM presents an increase of 1% in one-pass evaluation (OPE) precision and 2% in OPE overlap.It indicates that the proposed WMSSVM model performs better than the original SSVM model used by scale-DLSSVM in OPE precision and OPE success.We then determine that the tracking algorithm that uses the WMSSVM model is upgraded by 3.4% in background clutter scene,3.4% in deformation scene,1.4% in fast-motion scene,3.4% in motion blur scene,3.6% in occlusion scene,and 4.3% in out-of-view scene.The aforementioned scenes are typical difficult scenes that exist in a tracking area.The proposed tracker performs worse than the original scale-DLSSVM tracker in a sequence called (its OPE precision score is reduced from 0.459 to 0.056).The reason for this result is the drastic illumination variation that occurs at approximately the 60th frame,which lasts for approximately nine frames,and influences nearly the entire camera scene.This condition is difficult for the tracker to deal with.The proposed WMSSVM tracker also shows promising results compared with other state-of-the-art tracking algorithms.Although our tracker scores lower than the hierarchical convolutional features for visual tracking (HCFT) in OPE precision,two observations are made.First,the OPE precision evaluation metric utilizes a fixed distance of 20 pixels between the predicted target position and the ground truth position,whereas the OPE success metric uses the area under curve as the metric score.Thus,the better performance of WMSSVM in OPE success is more convincing.Second,HCFT uses a high-dimensional feature from an off-line neural network trained from a large object detection dataset,which is time-consuming.By contrast,the proposed WMSSVM uses only an online-extracted features consisting of Lab color feature and local rank transformation feature. In this study,the confidence of samples is first incorporated into the SSVM tracking model,and then the WMSSVM tracking model and its optimization method are proposed.The effectiveness of the proposed method is validated on a tracking benchmark dataset with 100 sequences.The proposed method can track objects in complicated scenes,and it exhibits remarkable performance in videos with background clutter,deformation,occlusion,motion blur,and fast motion.
关键词:object tracking;structured support vector machine;sample confidence;discriminative classifier
摘要:The methods for calibrating a linear camera are limited,and most of them require special and high-accuracy equipment.The calibration of a linear camera is performed to obtain focus length,principal point,and distortion parameters.A linear model cannot be solved without additional constraints because the imaging model for a linear camera differs from that for a planar camera.This study proposed a new method for calibrating a linear camera based on a two-axis pan-tilt mechanism using angle measurement and a black-and-white striped board.We reduced the number of internal,external,and distortion parameters,and then made the model solvable by simplifying the planar camera model and providing considerable constraint information.We mainly used the least squares method and iterative optimization to obtain stable results.Results showed that the accuracy of this study could reach the same micron level of other research that used expensive equipment. The most popular method for planar camera calibration is the method of Zhang,which uses a chess board to gain world and image coordinate points and the congruent relationship among these points to calculate homography matrix.The internal parameters can be determined from the homography matrix via singular value decomposition,and the distortion parameters are obtained using Brown's method.However,a linear camera model differs from a planar camera model.Therefore,the method of Zhang cannot be used to calibrate a linear camera,and constraints are insufficient to solve the model.We established a linear camera model by limiting the dimensions of a planar camera model to only one based on the planar camera model and method of Zhang.Only two internal parameters should be solved,namely,focus length and principal point.However,many external parameters were required to be calculated,but the constraints were inadequate,thereby causing the model to become unsolvable.We designed a calibration board to create additional constraints to reduce the unknown external parameters.We then proposed a procedure for calibrating a linear camera,which adjusted the pose of the two-axis pan-tilt mechanism and the black-and-white striped board.Finally,the external parameters could be reduced to only two.However,the linear camera model remained unsolvable.We used a two-axis pan-tilt mechanism to obtain angle information to solve the problem.We calculated the internal and external parameters of the linear camera by using the edge points of the black-and-white striped board obtained via edge detection,the angles measured using the two-axis pan-tilt mechanism,the geometric model,and the least squares method.We acquired the initial value of the internal parameters and integrated them into the distortion formula,which was simplified from the distortion model for the planar camera model,to determine the initial value of the distortion parameters.Subsequently,iterative optimization was performed to optimize the internal and distortion parameters to gain stable results.The applicability and precision of this calibration method depend on three factors:the precision of the two-axis pan-tilt mechanism,the lens angle,and the resolution of the CCD camera. In the experiment,we repeatedly measured the internal and distortion parameters of three lenses (6 mm,8 mm,and 10 mm) based on the proposed method and analyzed the iterative optimization of the internal and distortion parameters,which could achieved a stable result within four iterations.We then used relative standard deviation (RSD) to calculate calibration accuracy.The RSD of the focus length of the three lenses was less than 0.05%,whereas the RSD of the principal point was less than 0.1%.Therefore,the proposed method is feasible and stable.Focal length calibration accuracy was better than 5 μm,whereas principal calibration accuracy was better than 3 μm.Compared with other studies that used expensive equipment for calibration,we could achieve the same micron level accuracy with a low cost.We simply used a two-axis pan-tilt mechanism and a printed calibration board,and calibration could be performed anywhere using this simple equipment.This approach can save significant money,time,and work.Nevertheless,the applicability of this method depends on the precision of the two-axis pan-tilt mechanism,the lens angle,and the resolution of the CCD camera.If the step size of the two-axis pan-tilt mechanism is 0.012 9° and the resolution of the CCD camera is 1 436,then this calibration method can work the best on a 16 mm lens.For a lens with longer focus length,the precision of the two-axis pan-tilt mechanism should be increased. This study proposed a new method for calibrating a linear camera,which established the model for a linear camera,derived the formulas,and provided the calibration process.The feasibility of this method was also analyzed.The linear camera model was built by simplifying the planar camera model.The number of parameters to be solved was reduced to a minimum by adjusting the pose of the two-axis pan-tilt mechanism and the black-and-white striped board.We used angle information from the two-axis pan-tilt mechanism to make the model solvable,and iterative optimization was performed to obtain reliable and stable internal and distortion parameters.Three lenses with different focus lengths were tested based on our method to calculate the internal and distortion parameters.The result showed that the accuracy of this study could reach the same micron level as that of other research that used expensive equipment.We can test only lenses with a focus length that does not exceed 16 mm because of the limit in the precision of the two-axis pan-tilt mechanism and the resolution of the CCD camera.If we intend to calibrate lenses with longer focus lengths,then we will require a two-axis pan-tilt mechanism with higher precision.Further study will be conducted using better equipment to test more lens under different situations.
摘要:The Landsat-8 satellite applies the linear array push-broom method to obtain imagery and then processes the captured image by scene to generate standard scene production.The demand for wide-area remote sensing data is increasing,and long strip satellite data have high application value.Long strip satellite data are typically acquired by mosaicking several standard scenes of orthographic production.However,this classical method results in low efficiency and may fail to acquire long strip satellite orthographic production in the presence of too much clouds.A novel orthographic method for Landsat-8 long strip satellite imagery is proposed in this study to overcome the shortcomings of the conventional method. The proposed method is conducted in the unit of the entire long strip satellite imagery,and long strip orthographic production is generated directly without image mosaicking.The entire procedure can be divided into three steps:ground control point correlation,long strip geometric precise correction,and ortho-rectification.First,ground control points are correlated with the long strip systematic imagery using the normalized cross-correlation method.The GLS2005 control point database is stored by scene;hence,control points in every scene are reprojected based on the long strip UTM zone code and merged to generate a long strip control point database.Second,the long strip adjustment equation is formulated based on strip constraint by using the ground control point correlation result and the satellite parameters,such as ephemeris and attitude parameters.A precise line-of-sight model is built after least squares adjustment.An appropriate ground control point optimization method is introduced to accelerate the solution for the adjustment equation.Finally,the long strip orthographic production is resampled by using the precise line-of-sight model and referring to a digital elevation model. Experiments are designed and conducted to verify the performance of the proposed long strip method.When the long strip imagery contains less than 15 scenes,the long strip satellite products obtained using the proposed method achieve geometric accuracy within 12 m.The processing efficiency of the proposed method is twice that of the conventional method mainly as a result of saving mosaicking time and processing time for overlapping regions.For scenes covered by clouds,the proposed method can still achieve long strip orthographic production with qualified geometric accuracy by using surrounding scenes. The proposed method can achieve qualified geometric accuracy and high processing efficiency within a certain scope of scenes.Moreover,the proposed method exhibits efficient performance under conditions with cloud covering.
摘要:Remote sensing image classification refers to the use of computers to analyze the spectral and spatial information of various land cover objects in remote sensing images,divide feature space into non-overlapping subspaces,and place a pixel into a specific subspace.In computer vision,this procedure aims to assign a predefined semantic label to each pixel in an image.This process is also called "semantic segmentation." The rapid development of computer application technology,aerospace,and sensor technology in recent years has resulted in numerous methods for acquiring different types of remote sensing image data.As an important aspect of remote sensing technology,the classification of high-resolution remote sensing imagery has gained considerable attention.A novel image classification method is proposed in this study.This method is based on a fully connected conditional random field (CRF) model,which is combined with a convolutional neural network (CNN).These two models are merged to utilize their respective advantages to further improve classification accuracy for remote sensing images. On the one hand,most traditional classification methods typically rely on artificial experiences to extract the characteristics of training samples.After learning,a single-layer feature without a hierarchical structure is obtained.These methods generally have shallow structures,and the features they produced are relatively simple.By contrast,as a new research direction in the field of machine learning,deep learning can transform the feature representation of training samples from the original space into a new feature space layer by layer,as well as learn to automatically yield a hierarchical feature representation,which is conducive to classification and feature visualization.For the past years,this new subject has achieved a significant breakthrough in the field of computer vision applications,such as visual recognition challenges,image classification,and object detection.As one of its representatives,CNN has been widely used in pattern recognition to avoid the complex preprocessing of images.We use CNN in this study to replace the traditional classification methods to obtain essential features of the input image.On the other hand,traditional classification methods are based on the spectral statistical characteristics of pixels.These methods are also known as pixel-wise classification methods.They analyze the spectral information of each pixel individually by using a statistical learning algorithm,such as support vector machine (SVM),maximum likelihood classification,minimum distance method,decision tree,and -means clustering.These methods typically produce high classification errors and results with low accuracies because they do not consider the rich spatial contextual information of images.We draw support from the probabilistic graphical model,which is one of the research hot spots in machine learning and pattern recognition,to solve this problem.When this model is utilized,researchers cannot only use Bayesian probability statistic theory to solve the problem,but also mature graph theory to deal with contextual information.As an excellent representative of a probabilistic graphical model,the CRF model for 1D sequence data processing was proposed by Lafferty in 2001.This model can incorporate spatial contextual information in the aspects of labels and observed data.The uniqueness of this model is that it can be flexible to modeling posterior distribution directly.The early CRF model was mainly used in natural language processing and speech recognition fields,and then it was successfully applied to image processing by Kumar and Hebert in 2003.Although considerable research has been conducted on CRF models,the conventional CRF still exhibits oversmoothing problems.Therefore,we add regional restriction (RR) to enhance the consistency of the classification results in connected areas to protect the edge structure of land cover objects.In summary,the steps of our proposed method are as follows.We preclassify the entire remote sensing image into certain land cover types via CNN using the results of class membership probabilities as the unary potential in the CRF model.The pairwise potential of CRF is defined by a linear combination of Gaussian kernels,which forms a fully connected neighbor structure instead of the common four-neighbor or eight-neighbor structure.RR is also incorporated into the framework to promote the consistency of connected areas.We use the mean shift algorithm to obtain superpixels and correct the classification results by calculating their average posterior probabilities.A highly efficient approximate inference algorithm,namely,mean field inference,is generated for the final model. Our experimental results,which are based on three different remote sensing images,demonstrate that the proposed classification framework exhibits competitive quantitative and qualitative performances,which effectively alleviate salt-and-pepper classification noise,improve the oversmoothing phenomenon,and protect the edge structure of land cover objects.The experiments are conducted using class accuracy,overall classification accuracy (OA),average classification accuracy (AA),and the kappa coefficient for the entire quantitative analysis.Compared with those of SVM,CNN,and fully connected CRF,the final accuracies of our experiments are significantly improved.AA is increased by 3.28 percentage points,OA is increased by 3.22 percentage points,and the kappa coefficient is increased by 5.07 percentage points. Traditional classification methods have two shortcomings.The first problem is insufficient feature extraction,which leads inaccurate classification results.The second problem is that pixel-based methods only consider the information of single points and disregard the mutual influence of surrounding points.The combination of CNN and CRF cannot only obtain the essential characteristics of pixels,but also considers the contextual information of an image.Therefore,our method can achieve accurate classification results.Moreover,the integration of RR can protect the edge structure of land cover objects to yield a satisfactory classification performance.The proposed method is accurate and effective,and it can be used in remote sensing image classification.
摘要:The community is one of the main places in a city where humans reside.Community security is an important part of the development of a harmonious society.Fire disaster is one of the main factors that threaten community security.A large number of accidents show that toxic fire fumes and a high-temperature environment are the main causes of casualties.A numerical simulation model can analyze smoke flow and temperature distribution that are caused by fire.Therefore,research on community fire scene simulation models is important to fire rescue and emergency evacuation.Fire scene simulation requires models for indoor and outdoor objects and the environment,as well as a large number of numerical calculations.However,a traditional map cannot meet the aforementioned requirements. In this study,we propose a fire scene numerical simulation model for community security based on a research on indoor and outdoor maps.The model is required to establish the distribution of community objects through traditional mapping technology.It utilizes indoor mapping technology to create the interior scene of buildings.The contents of the indoor map include the door,fire facility,passage,and room range layers.The contents of the outdoor map include community boundaries,transportation,vegetation,residential buildings,fire facilities,and other layers.A 3D fire scene model is reconstructed using 3D modeling technology.A fire numerical equation is imported to calculate the simulation.The model can be displayed in 3D dynamics.The fire numerical simulation is calculated using the Fire Dynamic Simulation (FDS) software,which was developed by the National Institute of Standards and Technology in the USA.FDS solves the fire numerical simulation equation to calculate the smoke diffusion and heat transfer processes in a single grid as a unit.In theory,a fine grid size provides accurate calculation results.Under the same calculating volume conditions,if a fine grid size is designed,then numerous grids will be generated.A subregional variable-scale meshing method is proposed to improve the efficiency of the fire numerical calculation for the complex structure of a community because computation on numerous grids is costly in terms of time.The following three rules should be followed when setting up a subregional variable-scale meshing.First,a number of calculation regions should be divided according to the shapes of the buildings.The different regions of a grid can overlap,abut,or not be in contact.The details of a fire region should not be divided.Second,the grid size should be set according to the complexity of the community structure and the distance from the fire.The section near the fire scene of the entire indoor and outdoor areas divides the fine grid,and the community near the ground also divides the fine grid.The rough grid is divided at the region that cannot be affected by the fire.Third,the abutting regions should follow the mesh boundary alignment.The most important rule of the mesh boundary alignment is that abutting cells should have the same cross-sectional area or integral ratios.We select the community of Hongshuwan as the study area to conduct the fire numerical simulation experiments.We also use three different meshing methods to experiment on the fire scene simulation in Hongshuwan community,namely,fine meshing,rough meshing,and subregional variable-scale meshing. The burning changes of indoor fires can be observed in the results of a single-layer fire simulation model of Hongshuwan community.The fire-burning changes include smoke flow and temperature distribution.We determine that grid size is positively correlated with calculation accuracy by comparing the three meshing methods.Therefore,we set the results of the fine meshing method as the standard to compare the computational accuracy and computational efficiency of the other two meshing methods.For calculation speed,the results of the subregional variable-scale meshing and rough meshing methods are 8 times and 24 times that of the fine meshing method,respectively.The calculation speeds of the subregional variable-scale meshing and rough meshing method are faster than that of the fine meshing method.The temperature simulation results show that the temperature-changing curves of the subregional variable-scale meshing method approximate those of the fine meshing method.The differences between the rough meshing and fine meshing methods are significant.The correlation coefficient of the temperature simulation results between the subregional variable-scale meshing and fine meshing methods is greater than 0.96,which proves that the results of the two methods are highly correlated.Compared with the smoke diffusion results of the fine meshing method,the minimum visibility error of the subregional variable-scale meshing method is 2 m,whereas that of the rough meshing method is 13.5 m.These results show that the smoke simulation accuracy of the subregional variable-scale meshing method is higher than that of the rough meshing method.The simulation accuracy of the subregional variable-scale meshing method is closer to that of the fine meshing method,and its efficiency is higher than that of the fine meshing method. In this study,a simulation model for a community fire scene is constructed to extend the application of an indoor map.The smoke diffusion and temperature distribution laws can be analyzed through the aforementioned fire scene numerical simulation model.Subregional variable-scale meshing can improve simulation efficiency while maintaining the level of simulated accuracy.The model can be applied to fire rescue and emergency evacuation at the community scale.The new approach has an important application value.
关键词:fire scene simulation;indoor map;sub-regional variable-scale meshing;three-dimensional modeling;numerical simulation
摘要:Crime geology research focuses on identifying the spatiotemporal crime distribution pattern,establishing spatiotemporal crime-forecasting models through machine learning,formulating efficient police prevention and control protocols,and effectively preventing the occurrence of crimes.Existing research has shown that crime data are significantly unbalanced in space and time.Most crimes are focused on the central urban area or a densely inhabited district.Unbalanced data will typically lead to a bias toward the majority class.Consequently,a classifier will display a poor recognition rate for the minority class,whereas minority class areas are usually hot spotswhere crimes frequentlyoccur.Accordingly,training models based on crime data (e.g.,genetic algorithm (GA)-back propagation (BP) neural network) become weak learners,which makes the desired prediction accuracy difficult to achieve. This study presents a novel algorithm based on a boosting ensemble learning algorithm to solve the aforementioned problem.The boosting ensemble learning algorithm utilizes more than one predictor for decision making,and thus,provides several advantages as follows:1) The design of a classifier ensemble aims to create a set of complementary/diverse classifiers and to apply an appropriate fusion method to merge their decisions.2) The ensemble learning algorithm may exhibit an improved performance compared with a standard single classifier approach because it canapply the unique strengths of each individual classifier in the pool.3) Ensembles may be robust and insignificantly prone to overfitting because they adopt mutually complementary models with different strengths.Simultaneously,a number of issues have to be considered when using the ensemble learning algorithm:1) how to select a pool of diverse and mutually complementary individual classifiers;2) how to design inter connections among the classifiers in the ensemble,i.e.,how to determine ensemble topology;and 3) how to conduct a fusion step to control the degree of influence of each classifier on the final decision.In consideration of the aforementioned issues,the new algorithm utilizes the GA-BP neural network to make a base classifier and K-means clustering to integrate the base classifier,thereby realizing the objective of converting weak learners into strong learners.The fusion step through K-means clustering is based on the theory of one base classifier performance on one data point that is frequently related to base classifier performance among the data points around thatdata point,which can fully consider the spatial relations among all the base classifiers.The proposed algorithm has two key steps as follows:1) Training data are resampled using a boosting-by-reweighting method that trains a designated number of base classifiers with a GA-BP neural network algorithm and then store all classfiers in a base classifier pool.2) Sample data are classified into several clusters using the K-means algorithm,and a base classifier with the highest forecasting accuracy is dynamically selected for each cluster to predict all data points that belong to the cluster. Experimental result demonstrates that the new algorithm has two advantages over the Synthetic Minority Oversampling Technique Boosting Boosting algorithm,which is widely used to solve the problem of regression and classification of unbalanced data:1) The proposed algorithm reduces the mean squared error of the minority class in data and the overall mean squared error.The overall mean squared error of the K-means-boosting algorithm is 9.85E-05,which outperforms the SMOTEBoosting algorithm with an overall mean squared error of 2.14E-04.2) The K-means-boosting algorithm can maintain the balance of minority precision and recall better than the SMOTEBoosting algorithm.The minority recall of the K-means-boosting algorithm is approximately equal to 52%,whereas that of the SMOTEBoosting algorithm is approximately equal to 91%.However,the minority precision of the K-means-boosting algorithm reaches 85%,which is more accurate than that of the SMOTEBoosting algorithm (i.e.,19%).The K-means-boosting algorithm also outperforms the AdaBoost algorithm,which is a classical ensemble learning algorithm because of two reasons:1)The K-means-boosting algorithm reduces the mean squared error of the minority class in unbalanced data and the overall mean squared error compared with the AdaBoost algorithm.2)The K-means-boosting algorithm improves minority recall and precision.The experimental result indicates that the number of clusters plays an important role in the algorithm.The overall mean squared error and the mean squared error of the minority class decrease with an increase in the number of clusters.Nevertheless,the rate of decline decreases gradually and will eventually approach a limit,whereas the computational cost required for classifier integration will continue to increase.No consensus rule currently exists to determine the number of clusters,and this variable can only be determined manually based on data. This study proposes a boosting ensemble learning algorithm that integrates base classifiers using the -means clustering algorithm to address the problem of unbalanced data regression.The application effect of the algorithm on the spatiotemporal prediction of police intelligence data proves that the method can deal with the prediction of grid-based spatiotemporal intelligence data.Compared with traditional boosting algorithms that integrate base classifiers into a weighted average method,the proposed method significantly reduces the overall mean squared error and the mean squared error of minority classes.Similarly,compared with the SMOTEBoosting algorithm,which is commonly used to solve the problem of unbalanced data regression,the proposed method does not only reduce the overall mean squared error of the sample data while reducing the mean squared error of the minority classes,but also maintain balance between minority precision and recall.The algorithm can be extended to other similar areas where unbalanced data regression or classification problems should be addressed.