低比特率语音流大容量分层隐写方法
High-capacity hierarchical steganography in a low-bit rate speech codec
- 2022年27卷第12期 页码:3461-3475
收稿:2021-05-18,
修回:2021-12-15,
录用:2021-12-22,
纸质出版:2022-12-16
DOI: 10.11834/jig.210307
移动端阅览

浏览全部资源
扫码关注微信
收稿:2021-05-18,
修回:2021-12-15,
录用:2021-12-22,
纸质出版:2022-12-16
移动端阅览
目的
2
基于语音增强和丢包补偿等技术的互联网低比特率编解码器(internet low bit rate codec
iLBC)在丢包率较高的网络环境下仍具有很好的语音质量。如何在隐写容量、不可感知性和抗检测性之间达到理想均衡是iLBC音频隐写面临的难点。为此,本文提出一种基于分层的iLBC语音大容量隐写方法。
方法
2
首先分析iLBC的编码比特流结构。然后基于主观语音质量评估指标PESQ-MOS(perceptual evaluation of speech quality-mean opinion score)和客观语音质量评估指标MCD(mel cepstral distortion)分析在线性频谱频率系数矢量量化过程、动态码本搜索过程和增益量化过程进行隐写对语音质量的影响,提出一种隐写位置分层方法,在增益量化过程和动态码本搜索过程按照嵌入容量和层次的优先级依次进行隐写,尽可能降低失真;对不能嵌满的层,提出一种基于Logistic混沌映射的嵌入位置选择方法,提升隐写的随机性和安全性。最后采用量化索引调制方法进行秘密信息嵌入,进一步提升隐写的安全性。
结果
2
在中英文语音数据集SSD(steganalysis-speech-dataset)上的对比实验结果表明,本文提出的分层隐写方法在隐写容量上提升了1倍,且保持了较好的不可感知性,没有因为写入额外秘密信息而导致音频过度失真。此外,本文方法在30 ms音频帧上嵌入量小于等于18 bit、在20 ms音频帧上嵌入量小于等于12 bit时可以很好地抵抗基于深度学习的音频隐写分析器的检测。
结论
2
本文方法可以充分挖掘iLBC语音的隐写潜能,在提升隐写容量的前提下,仍能保证良好的不可感知性和抗检测性。
Objective
2
Steganography is a novel of technology that involves the embedding of hidden information into digital carriers
such as text
image
voice
or video data. To embed hidden information into the audio carrier with no audio quality loss
audio-based steganography utilizes the redundancy of human auditory and the statistical-based audio carrier among them. The voice-enhanced and packet-loss compensation
and internet low bit rate codec based (iLBC-based) techniques can maintain network-context high voice quality with high packet loss rate
which develops the steganography for the iLBC speech in the field of information hiding in recent years. However
it is challenged to hide information in iLBC due to the high compression issue. Moreover
human auditory system
unlike the human visual system
is highly vulnerable for identifying minor distortions. Most of the existing methods are focused on the processes of linear spectrum frequency coefficient vector quantization
the dynamic codebook searching or the acquired quantization in iLBC. Although these methods have good imperceptibility
they are usually at the expense of steganography capacity
and it is difficult to resist the detection of the deep learning-based steganalysis technology. Therefore
the mutual benefit issue is challenged for the iLBC speech steganography between steganography capacities
imperceptibility
and anti-detection
in which the steganography capacity is as high as possible
the imperceptibility is as good as possible
and the resistance to steganalysis is as strong as possible. We develop a hierarchical-based method of high-capacity steganography in iLBC speech.
Method
2
1) The structure of iLBC bitstream is analyzed. 2) The influence of steganography processes in the linear spectrum frequency coefficient vector quantization
the dynamic codebook search
and the gain quantization on the voice quality is clarified based on the perceptual evaluation of speech quality-mean opinion score (PESQ-MOS) and Mel cepstral distortion (MCD). A hierarchical-based steganography position method is demonstrated to choose invulnerable layers and reduce distortions via gain quantization and the dynamic codebook searching in terms of the steganography capacity and the hierarchy priority. For the unfilled layer
an embedded position-selected method based on the Logistic chaotic map is also developed to improve the randomness and security of steganography. 3) The quantization index module is to embed the hidden information for steganography security better.
Result
2
Our hierarchical steganography method realizes the one time extended steganography capacity. Additionally
we adopt the Chinese and English speech data set steganalysis-speech-dataset (SSD) to make comparative experiments
which includes 30 ms and 20 ms frames and 2 s
5 s
and 10 s speech samples. The experimental results on 5 280 speech samples show that our method can strengthen imperceptibility and alleviate distortions in terms of embedding more hidden information. To validate our anti-detection performance against the deep learning-based steganalyzer
we generate 4 000 original speech samples and 4 000 steganographic speech samples
of which 75% is used as the training set and 25% as the test set. The detection results show that the steganography capacity is less than or equal to 18 bit on 30 ms frame
and 12 bit on 20 ms frame. It can resist the detection of the deep learning-based audio steganalyzer well.
Conclusion
2
A hierarchical steganography method with high capacity is developed in the iLBC speech. It has the steganography potential of the iLBC speech for imperceptibility and anti-detection optimization on the premise of the steganography capacity extension.
Ahani S, Ghaemmaghami S and Wang Z J. 2015. A sparse representation-based wavelet domain speech steganography method. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1): 80-91 [DOI: 10.1109/TASLP.2014.2372313]
Gao Z Z, Tang G M and Wang S. 2018. A novel VoIP steganography method based on Bayesian network and matrix embedding. Journal of Computer Research and Development, 55(4): 854-863
高瞻瞻, 汤光明, 王硕. 2018. 基于贝叶斯网络模型和矩阵嵌入的VoIP隐写算法. 计算机研究与发展, 55(4): 854-863 [DOI: 10.7544∕issn1000-1239.2018.20161042]
Gao Z Z, Wei D W, Tang G M and Li X L. 2017. Fast matrix embedding based on random linear code. Acta Electronica Sinica, 45(5): 1139-1149
高瞻瞻, 韦大伟, 汤光明, 李晓利. 2017. 基于随机线性码的快速矩阵嵌入方法. 电子学报, 45(5): 1139-1149 [DOI: 10.3969/j.issn.0372-2112.2017.05.017]
Gong C, Yi X W, Zhao X F and Ma Y. 2019. Recurrent convolutional neural networks for AMR steganalysis based on pulse position//Proceedings of the ACM Workshop on Information Hiding and Multimedia Security. Paris, France: ACM: 2-13 [ DOI: 10.1145/3335203.3335708 http://dx.doi.org/10.1145/3335203.3335708 ]
Huang Y F, Liu C H, Tang S Y and Bai S. 2012. Steganography integration into a low-bit rate speech codec. IEEE Transactions on Information Forensics and Security, 7(6): 1865-1875 [DOI: 10.1109/TIFS.2012.2218599]
Huang Y F, Tao H Z, Xiao B and Chang C. 2017. Steganography in low bit-rate speech streams based on quantization index modulation controlled by keys. Science China Technological Sciences, 60(10): 1585-1596 [DOI: 10.1007/s11431-016-0707-3]
Lin Z N, Huang Y F and Wang J L. 2018. RNN-SM: fast steganalysis of VoIP streams using recurrent neural network. IEEE Transactions on Information Forensics and Security, 13(7): 1854-1868 [DOI: 10.1109/TIFS.2018.2806741]
Liu P, Li S B and Wang H Q. 2017. Steganography integrated into linear predictive coding for low bit-rate speech codec. Multimedia Tools and Applications, 76(2): 2837-2859 [DOI: 10.1007/s11042-016-3257-x]
May R M. 1976. Simple mathematical models with very complicated dynamics. Nature, 261(5560): 459-467 [DOI: 10.1038/261459a0]
Ren Y Z, Liu D K, Xiong Q C, Fu J M and Wang L N. 2019a. Spec-ResNet: a general audio steganalysis scheme based on deep residual network of spectrogram [EB/OL]. [2021-01-21] . https://arxiv.org/pdf/1901.06838.pdf https://arxiv.org/pdf/1901.06838.pdf
Ren Y Z, Liu D K, Yang J and Wang L N. 2019b. An AMR adaptive steganographic scheme based on the pitch delay of unvoiced speech. Multimedia Tools and Applications, 78(7): 8091-8111 [DOI: 10.1007/s11042-018-6600-6]
Ren Y Z, Wu H X and Wang L N. 2018. An AMR adaptive steganography algorithm based on minimizing distortion. Multimedia Tools and Applications, 77(10): 12095-12110 [DOI: 10.1007/s11042-017-4860-1]
Su Z P, Li W W, Zhang G F, Hu D H and Zhou X X. 2020. A steganographic method based on gain quantization for iLBC speech streams. Multimedia Systems, 26(2): 223-233 [DOI: 10.1007/s00530-019-00624-w]
Tian H, Guo S T, Qin J, Huang Y F, Chen Y H and Lu J. 2016. Adaptive voice-over-IP steganography based on quantitative performance ranking. Acta Electronica Sinica, 44(11): 2735-2741
田晖, 郭舒婷, 秦界, 黄永峰, 陈永红, 卢璥. 2016. 基于可量化性能分级的自适应IP语音隐写方法. 电子学报, 44(11): 2735-2741 [DOI: 10.3969/j.issn.0372-2112.2016.11.024]
Wu Q L and Wu M. 2016. Novel audio information hiding algorithm based on wavelet transform. Journal of Electronics and Information Technology, 38(4): 834-840
吴秋玲, 吴蒙. 2016. 基于小波变换的语音信息隐藏新方法. 电子与信息学报, 38(4): 834-840 [DOI: 10.11999/JEIT150856]
Wu Z J, Li C L and Li R. 2020. Speech information hiding method based on random position selection and matrix coding. Journal of Electronics and Information Technology, 42(2): 355-363
吴志军, 李常亮, 李荣. 2020. 基于随机位置选择和矩阵编码的语音信息隐藏方法. 电子与信息学报, 42(2): 355-363 [DOI: 10.11999/JEIT181163]
Wu Z J and Sha Y P. 2016. An implementation of speech steganography for iLBC by using fixed codebook//Proceedings of the 2nd IEEE International Conference on Computer and Communications. Chengdu, China: IEEE: 1970-1974 [ DOI: 10.1109/CompComm.2016.7925046 http://dx.doi.org/10.1109/CompComm.2016.7925046 ]
Yang H, Yang Z L, Bao Y J, Liu S and Huang Y F. 2020a. FCEM: a novel fast correlation extract model for real time steganalysis of VoIP stream via multi-head attention//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE: 2822-2826 [ DOI: 10.1109/ICASSP40776.2020.9054361 http://dx.doi.org/10.1109/ICASSP40776.2020.9054361 ]
Yang H, Yang Z L, Bao Y J, Liu S and Huang Y F. 2020b. Fast steganalysis method for VoIP streams. IEEE Signal Processing Letters, 27: 286-290 [DOI: 10.1109/LSP.2019.2961610]
Yi X W, Yang K, Zhao X F, Wang Y T and Yu H B. 2019. AHCM: adaptive Huffman code mapping for audio steganography based on psychoacoustic model. IEEE Transactions on Information Forensics and Security, 14(8): 2217-2231 [DOI: 10.1109/TIFS.2019.2895200]
相关作者
相关机构
京公网安备11010802024621