多特征决策融合的音频copy-move篡改检测与定位

张国富; 肖锐; 苏兆品; 廉晨思; 岳峰

发布时间： 2022-09-17
摘要点击次数： 1229
全文下载次数： 977
DOI: 10.11834/jig.210747
2022 | Volume 27 | Number 9

多特征决策融合的音频copy-move篡改检测与定位

张国富^1,2,3,4, 肖锐¹, 苏兆品^1,2,3,4, 廉晨思⁵, 岳峰^1,4(1.合肥工业大学计算机与信息学院, 合肥 230601;2.大数据知识工程教育部重点实验室(合肥工业大学), 合肥 230601;3.智能互联系统安徽省实验室(合肥工业大学), 合肥 230009;4.工业安全应急技术安徽省重点实验室 (合肥工业大学), 合肥 230601;5.安徽省公安厅物证鉴定管理处, 合肥 230000)

摘要

目的随着各种功能强大的音频编辑软件的流行，使得不具备专业知识的普通用户也可以轻松随意地对数字音频文件进行编辑甚至是恶意篡改，这给数字音频的鉴真带来了极大挑战。其中，copy-move篡改是将同一段音频中的部分区域复制粘贴到其他部分，从而实现对音频的语义篡改。由于其篡改片段的特性与原始音频文件匹配度极高，导致检测难度极大，已成为音频取证领域的一个研究热点。然而，现有研究大多基于语音端点检测技术，只能检测出整个有声片段是否发生篡改，而无法准确定位篡改的具体位置。为此，本文提出一种基于多特征决策融合的音频copy-move篡改检测与定位方法。方法首先利用基于谱熵法的语音端点检测技术将音频划分为若干静音段和有声段，并基于能熵比方法进一步对有声段进行字节分割；然后提取每个字节的基音频率特征、颜色自相关图特征和短时能量特征，并利用动态时间规整距离计算任意两个字节在基音频率特征上的相似度，采用余弦距离计算两个字节在颜色自相关图特征上的相似度，利用短时能量和差值计算两个字节在短时能量特征上的相似度；最后基于多特征决策融合准确定位篡改位置。结果在相关数据集上的对比实验结果表明，本文提出的多特征决策融合方法在精确率和召回率上均优于对比方法，达到了90%以上。在检测的精确率上平均提升了约16%，在召回率上平均提升了约26%。此外，在定位的精准度上平均提升了约45%。而且，在对数据集进行一些常规信号处理攻击后，本文方法仍可以达到94%以上的检测准确率和召回率，且在检测的精确率上平均提升了约16%，在召回率上平均提升了约31%。结论本文方法不仅具有更高的检测精确率、召回率和定位精准度，而且对常规信号处理攻击也具有更好的鲁棒性。

关键词

音频取证 copy-move篡改检测与定位多特征决策融合基音频率颜色自相关图短时能量

Multi-feature decision fused detection and localization method for copy-move forgery of digital audio clips

Zhang Guofu^1,2,3,4, Xiao Rui¹, Su Zhaopin^1,2,3,4, Lian Chensi⁵, Yue Feng^1,4(1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;2.Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei 230601, China;3.Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology), Hefei 230009, China;4.Anhui Province Key Laboratory of Industry Safety and Emergency Technology (Hefei University of Technology), Hefei 230601, China;5.Institute of Forensic Science, Department of Public Security of Anhui Province, Hefei 230000, China)

Abstract

Objective Forensic-oriented digital audio technology has been intensively developing in terms of the growth of audio recordings.Digital audio recordings can be as the evidences for the legal disputes issue of civil litigation in common.However,the original semantic information of the audio recordings can be changed very easily by widely via several of digital audio editing software and their online tutorials.Consequently,audio forensics are challenged of the real or fake issue derived from tampered audio recording behavior.A copy-move forgery can distort the original recordings through audio clip.The source and the target segments in the copy-move forgery are both derived from the same audio recording compared to splicing and synthesized forgeries.Such attributes like amplitude,frequency,length,noise,tone,and even velocity can be well-matched between the forged segments and the recording,especially for the segments of very short duration for utterances.The requirement of blind audio tampering detection has promoted blind audio forensics via the copy-move forgery detection and localization on digital audio recordings.However,most of the existing methods divide the audio recording into very short multiple segments based on voice activity detection (VAD) related techniques.The accuracy of localization and forgery is challenged although the two similar segments can be identified within the recording.We facilitate multi-feature decision fusion method for detecting and localizing the audio copy-move forgeries.Method First,the audio recording is segmented into many voiced and unvoiced parts in terms of spectral-entropy-based VAD technology.Next,all the voiced segments are further split into syllables,each of which contains a Chinese character only according to the energy to spectral entropy ratio.Then,the pitch frequency,color auto-correlogram,and short-time energy features of each syllable are extracted respectively.The similarity of any two syllables on the pitch frequency features is calculated by the dynamic time warping distance.The similarity of the two syllables on the color auto-correlogram features is obtained by the cosine distance,and the similarity of the two syllables on the short-time energy features is generated by the difference of the short-time energy sum,respectively.Finally,audio forgeries are accurately localized on the basis of multi-feature decision fusion and the three similarities mentioned above.In detail,a copy-move forgery has occurred,and the approximate forgery locations are preliminarily determined for any two pending syllables if each similarity of the two syllables cannot meet the requirement of pre-specified threshold.After that,two new syllables are constructed through both of the two forged syllables by one frame.It is calculated by the three similarities of the new syllables compared to the threshold.If each similarity is still less than the threshold,the two syllables are extended by one frame again until one of the three similarities is beyond the corresponding threshold.The phase of two new syllables positions are based on forgery locations exaction only.Result A classical database is used to generate our copy-move forged dataset,which includes 500 authentic recordings and 500 forged recordings.The comparative analyses show that our proposed multi-feature decision fusion method has their potentials in terms of precision and recall of more than 97%.Specifically,the detection precision of the proposed method is improved by roughly 16 percentage points,the recall is improved by about 26 percentage points,and the localization accuracy is improved by more than 45% on average.Additionally,our detection precision and recall can reach more than 94% as well via common signal processing attacks like Gaussian noise addition,low-pass filtering,down-sampling,up-sampling,and MP3 format compression.Moreover,the detection precision is improved by about 16 percentage points,and the recall is improved by about 31 percentage points.Conclusion Our method not only has higher detection precision,recall,and localization accuracy,but also has better robustness against common signal processing attacks.

Keywords

audio forensics copy-move forgery detection and localization multi-feature decision fusion pitch frequency color auto-correlogram short-time energy

在线采编平台

在线出版

年度会议

下载中心

年度信息