Liu Yizhi, Tang Sheng, Wang Xiangdong, Lin Shouxun, Zhang Yongdong. Fusing audio-words with visual features for adult video detection[J]. Journal of Image and Graphics, 2012, 17(7): 791-797. DOI: 10.11834/jig.20120707.
Fusing audio-words with visual features for adult video detection
Multi-modality based adult video detection is an effective approach for filtering pornographic information.However
existing methods lack accurate representation methods of audio semantics.Therefore
a novel method is presented in this paper to fuse audio-words with visual features for adult video detection.First
we propose a periodicity-based segmentation algorithm of units of energy envelope (EE).Audio streams are divided into sequences of EE.Second
audio semantics representation method based on EE and BoW (Bag-of-Words) is presented to describe the features of the EE as the occurrence probabilities of audio-words.Integrated weighting methods are used to fuse the detection results of audio-words and visual features.Furthermore
we propose a periodicity-based decision algorithm to judge adult videos to cooperate with the preceding periodicity-based segmentation algorithm.Therefore
we make full use of the periodicity.Our experiments show that our approach remarkably improves the detection performance compared with the method based on visual features.The true positive rate achieves 94.44% while the false positive rate is 9.76%.