低比特率语音流大容量分层隐写方法

苏兆品; 张羚; 张国富

发布时间： 2022-12-19
摘要点击次数： 772
全文下载次数： 565
DOI: 10.11834/jig.210307
2022 | Volume 27 | Number 12

低比特率语音流大容量分层隐写方法

苏兆品^1,2,3,4, 张羚¹, 张国富^1,2,3,4(1.合肥工业大学计算机与信息学院，合肥 230601;2.大数据知识工程教育部重点实验室(合肥工业大学)，合肥 230601;3.智能互联系统安徽省实验室(合肥工业大学)，合肥 230009;4.工业安全应急技术安徽省重点实验室(合肥工业大学)，合肥 230601)

摘要

目的基于语音增强和丢包补偿等技术的互联网低比特率编解码器(internet low bit rate codec, iLBC)在丢包率较高的网络环境下仍具有很好的语音质量。如何在隐写容量、不可感知性和抗检测性之间达到理想均衡是iLBC音频隐写面临的难点。为此，本文提出一种基于分层的iLBC语音大容量隐写方法。方法首先分析iLBC的编码比特流结构。然后基于主观语音质量评估指标PESQ-MOS(perceptual evaluation of speech quality-mean opinion score)和客观语音质量评估指标MCD(mel cepstral distortion)分析在线性频谱频率系数矢量量化过程、动态码本搜索过程和增益量化过程进行隐写对语音质量的影响，提出一种隐写位置分层方法，在增益量化过程和动态码本搜索过程按照嵌入容量和层次的优先级依次进行隐写，尽可能降低失真；对不能嵌满的层，提出一种基于Logistic混沌映射的嵌入位置选择方法，提升隐写的随机性和安全性。最后采用量化索引调制方法进行秘密信息嵌入，进一步提升隐写的安全性。结果在中英文语音数据集SSD(steganalysis-speech-dataset)上的对比实验结果表明，本文提出的分层隐写方法在隐写容量上提升了1倍，且保持了较好的不可感知性，没有因为写入额外秘密信息而导致音频过度失真。此外，本文方法在30 ms音频帧上嵌入量小于等于18 bit、在20 ms音频帧上嵌入量小于等于12 bit时可以很好地抵抗基于深度学习的音频隐写分析器的检测。结论本文方法可以充分挖掘iLBC语音的隐写潜能，在提升隐写容量的前提下，仍能保证良好的不可感知性和抗检测性。

关键词

互联网低比特率编解码器(iLBC) 量化索引调制分层隐写嵌入位置大容量

High-capacity hierarchical steganography in a low-bit rate speech codec

Su Zhaopin^1,2,3,4, Zhang Ling¹, Zhang Guofu^1,2,3,4(1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;2.Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei 230601, China;3.Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology), Hefei 230009, China;4.Anhui Province Key Laboratory of Industry Safety and Emergency Technology (Hefei University of Technology), Hefei 230601, China)

Abstract

Objective Steganography is a novel of technology that involves the embedding of hidden information into digital carriers, such as text, image, voice, or video data. To embed hidden information into the audio carrier with no audio quality loss, audio-based steganography utilizes the redundancy of human auditory and the statistical-based audio carrier among them. The voice-enhanced and packet-loss compensation, and internet low bit rate codec based (iLBC-based) techniques can maintain network-context high voice quality with high packet loss rate, which develops the steganography for the iLBC speech in the field of information hiding in recent years. However, it is challenged to hide information in iLBC due to the high compression issue. Moreover, human auditory system, unlike the human visual system, is highly vulnerable for identifying minor distortions. Most of the existing methods are focused on the processes of linear spectrum frequency coefficient vector quantization, the dynamic codebook searching or the acquired quantization in iLBC. Although these methods have good imperceptibility, they are usually at the expense of steganography capacity, and it is difficult to resist the detection of the deep learning-based steganalysis technology. Therefore, the mutual benefit issue is challenged for the iLBC speech steganography between steganography capacities, imperceptibility, and anti-detection, in which the steganography capacity is as high as possible, the imperceptibility is as good as possible, and the resistance to steganalysis is as strong as possible. We develop a hierarchical-based method of high-capacity steganography in iLBC speech. Method 1) The structure of iLBC bitstream is analyzed. 2) The influence of steganography processes in the linear spectrum frequency coefficient vector quantization, the dynamic codebook search, and the gain quantization on the voice quality is clarified based on the perceptual evaluation of speech quality-mean opinion score (PESQ-MOS) and Mel cepstral distortion (MCD). A hierarchical-based steganography position method is demonstrated to choose invulnerable layers and reduce distortions via gain quantization and the dynamic codebook searching in terms of the steganography capacity and the hierarchy priority. For the unfilled layer, an embedded position-selected method based on the Logistic chaotic map is also developed to improve the randomness and security of steganography. 3) The quantization index module is to embed the hidden information for steganography security better. Result Our hierarchical steganography method realizes the one time extended steganography capacity. Additionally, we adopt the Chinese and English speech data set steganalysis-speech-dataset (SSD) to make comparative experiments, which includes 30 ms and 20 ms frames and 2 s, 5 s, and 10 s speech samples. The experimental results on 5 280 speech samples show that our method can strengthen imperceptibility and alleviate distortions in terms of embedding more hidden information. To validate our anti-detection performance against the deep learning-based steganalyzer, we generate 4 000 original speech samples and 4 000 steganographic speech samples, of which 75% is used as the training set and 25% as the test set. The detection results show that the steganography capacity is less than or equal to 18 bit on 30 ms frame, and 12 bit on 20 ms frame. It can resist the detection of the deep learning-based audio steganalyzer well. Conclusion A hierarchical steganography method with high capacity is developed in the iLBC speech. It has the steganography potential of the iLBC speech for imperceptibility and anti-detection optimization on the premise of the steganography capacity extension.

Keywords

internet low bit rate codec(iLBC) quantization index modulation hierarchical steganography embeddable positions high capacity

在线采编平台

论文出版

年度会议

下载中心

年度信息