张晏铭, 陈可江, 丁锦扬, 张卫明, 俞能海(中国科学技术大学)
目的 隐蔽通信是信息安全领域的一个重要研究方向，现有基于多媒体数据流构建隐蔽信道的方法，未考虑网络传输时波动产生的数据包丢失问题。本文提出一种基于跨数据模态信息检索技术的对网络异常具有鲁棒性的隐蔽通信方法，同时可以满足高隐蔽性和高安全性的要求。方法 本文提出了一个名为RoCC（robust covert communication）的通用隐蔽通信框架，它基于跨模态信息检索和可证明安全的隐写技术。所提方法将直接通信和间接通信两种形式相结合。直接通信通过VoIP（voice over Internet Protocol）网络通话服务进行，传递实时生成的音频流数据，接收方可以通过语音识别将其还原为文本。而间接通信则借助公共网络数据库进行载密数据的传输，接收方通过文本语义相似度匹配的方式来还原完整语义的载密文本数据，这有助于解决网络数据包丢失和语音识别误差导致的文本语义丢失的问题。结果 经实验测试，本方法在协议上具有更好的通用性，相对当前最鲁棒的方法在丢包率抵抗能力方面提高了5%，所用隐写算法满足可证安全性。同时，RoCC的数据传输率有73-136dps，能够满足实时通信需要。结论 RoCC隐蔽通信框架综合可证明安全隐写、生成式机器学习方法和跨模态检索方法的优势，与现有的方法比较，具有更加隐蔽和安全的优势，并且是当前对数据传输丢包异常最鲁棒的模型。
RoCC: Robust Covert Communication Based on Cross-Modal Information Retrieval
Zhang Yanming, Chen Kejiang, Ding Jinyang, Zhang Weiming, Yu Nenghai(University of Science and Technology of China)
Objective Covert communication is a pivotal research area in the field of information security. In order to safeguard the privacy of communication users and prevent occurrences of eavesdropping on confidential data transmissions, it is necessary to develop a highly covert and secure covert channel for the transmission of sensitive information. Most methods build covert channels by tunneling multimedia streams, however, the problem of packet loss caused by fluctuations in network transmission is not considered. This paper proposes a covert communication method that is robust to network anomalies based on cross-modal information retrieval and provably secure steganography. Method We propose a general covert communication framework based on cross-modal information retrieval and provably secure steganography, named RoCC (robust covert communication). We have observed a proliferation of artificially generated information from artificial intelligence systems. Examples include deep synthesis models, AI-driven artwork, intelligent voice assistants, and conversational chatbots, among others. These AI models are capable of synthesizing multimodal data, such as videos, images, audio, and text. Simultaneously, as generative models make significant strides, the practical application of provably secure steganography has become a reality. Thus, we introduce generative models and provably secure steganography techniques into our famework, embedding secret messages within cover text data. Furthermore, the domain of speech synthesis and recognition has witnessed the advent of numerous mature open-source models, facilitating seamless cross-modal conversion between speech and text. Our approach employs a combination of direct and indirect communication. Direct communication using VoIP (voice over Internet Protocol) network call service, delivering real-time synthesized audio stream data, and the receiver can restore the text through voice recognition. Indirect communication uses public network database for steganographic texts data transmission, the receiver restores lost text semantics due to network packet loss and speech recognition errors by means of text semantic similarity matching. The entire communication process can be succinctly described as follows. Assuming that the sender of confidential data is Alice and the recipient is Bob, both Alice and Bob share the same generative model and parameter settings for provably secure steganography. Alice embeds the confidential data into the generated text data using provably secure steganography techniques and publishes it on a publicly accessible and searchable network database. The only means of direct communication between the two parties is through VoIP network voice calls, acknowledging the potential loss of network data packets. Bob, based on the preserved semantic information, performs cross-modal information retrieval from the public database and successfully locates the corresponding steganographic texts data within the cover text. Subsequently, using the same generative model and parameter settings for steganography, Bob recovers the confidential data from the steganographic texts. Result Through speech recognition experiments, it was found that speech recognition often leads to semantic loss issues. The sentence error rate of the best model, standing at a mere 0.6125, fails to meet the text recovery capability required for constructing covert channels through direct cross-modal transformations. Text similarity analysis experiments indicate that the best model can achieve a recall metric of 1.0, thereby theoretically enabling complete semantic information restoration. In the experiment on combating network packet loss, it was observed that RoCC achieves an impressive information recovery rate of 0.9921 when the packet loss rate is 10% with a K value of 2. This demonstrates its exceptional resilience to network anomalies and establishes it as the current state-of-the-art solution in this regard. In the experiment on real-time performance, we validated the high efficiency of the RoCC system in various components such as speech synthesis and recognition, secure steganographic encoding and decoding, as well as text semantic similarity analysis. These results demonstrate its ability to meet the real-time requirements of covert channel communication. In comparative experiments, RoCC is compared with representative 8 methods. It shows that RoCC has outstanding advantages in terms of protocol versatility, robustness, and data steganography as provable security. In the anti-network packet loss experiment, compared with the current robust model, the resistance to packet loss rate has increased by 5%. Conclusion The covert communication framework proposed in this paper combines the advantages of provably secure steganography, generative machine learning methods and cross-modal retrieval methods, making the covert communication process more stealthy and secure. And we implemented the first method of using semantic similarity to restore data communication lost due to abnormal transmission process. After experimental verification, our framework meets the requirements of real-time communication in terms of performance, and the real-time transmission rate reaches 73-136 bps.