目的 感知图像哈希又称图像摘要或是图像指纹，是一种有效的图像认证技术，近年来受到了广泛的关注。该技术通过将图像的感知鲁棒特征转化为固定长度的哈希序列，来实现图像版权认证的目的。然而，该领域始终缺乏一个比较通用的数据集，已有数据集所使用的图像内容保留操作和真实场景差异较大，使得训练得到的神经网络架构在应对复杂的图像编辑操作时效果显著下降。 方法 本文针对感知图像哈希任务，面向实际图像内容认证场景构建了一个新的数据集。首先，将现实中常见的图像内容保留操作进行总结和分类，设计了48种单一、复合的图像内容保留操作来生成感知相似图像；然后，根据感知图像哈希的定义，选择与待认证图像语义相似但是感知内容不同的图像作为感知不相似图像，增加了该数据集的辨别难度；最终建立了一个包含116,400张图像的感知哈希图像数据集。 结果 由于本文提出的数据集使用的图像内容保留操作更加复杂，不相似图像也更加难以辨别，使得在该数据集上训练得到的深度神经网络具有较好的泛化能力，即这些神经网络即使不进行重新训练或是微调，也可以在其他数据集上取得较好的认证性能。同时，在该数据集上训练得到的神经网络在不同数据集上性能差别较小，体现了本文数据集具有较好的稳定性。 结论 本文设计了一个针对感知哈希的图像数据集，大量的对比实验表明了该数据集的有效性，该工作可对感知图像哈希领域的发展起到促进作用。
A large-scale image dataset for perceptual hashing
Zhou Yuanding, Fang Yaodong, Qin Chuan(University of Shanghai for Science and Technology)
Objective With the rapid development of social media, multimedia information on the internet is updated at an exponential rate. It is convenient to obtain and transmit digital images, which greatly increases the risk of malicious tampering and forgery of images. Accordingly, more and more attentions are paid to image authentication and content protection. Many image authentication schemes have been emerged recently, such as watermarking, digital signature and perceptual image hashing. Perceptual image hashing (PIH), also known as image abstract or image fingerprint, is an effective technique for image authentication that has attracted wide research attention in recent years. The goal of PIH is to authenticate an image by compressing perceptual robust features into a compact hash sequence with the fixed length. However, there is a lack of a general dataset in this field, and there are many problems in the dataset constructed by other methods. On the one hand, the types of image content-preserving manipulations used in these datasets are few and the intensity of attacks are relatively weak. On the other hand, the distinct images used in these datasets are so different from the images that need to be authenticated, making it easily to distinguish them from each other. The convolutional neural networks (CNNs) trained by these datasets have poor generalizability and can hardly cope with the complex and diverse image editing operations in reality. This has become an important factor limiting the development of the perceptual image hashing field. Method Based on the above knowledge, in this paper, we propose a specialized dataset based on various manipulations, which can deal with complex image authentication scenarios. The proposed dataset is divided into three subsets including original, perceptual identical and perceptual distinct images, and the two latter correspond to the robustness and the discrimination of PIH, respectively. Original images are selected from ImageNet1K, and each of them corresponds to one category. For the identical images, we summarize the content-preserving manipulations commonly used in the field of perceptual image hashing and group them into four major categories, namely geometric manipulations, enhancement manipulations, filter manipulations, and editing manipulations. Each major category is subdivided into many different types, totaling 35 single image content-preserving manipulations. To ensure the diversity and reflect the randomness of image editing in reality, we set a threshold for each kind of image content-preserving manipulations and let them randomly select the attack intensity in this range. In addition, we also randomly combine multiple single image content-preserving manipulations to form combination manipulations. Due to the randomness, there are some combined manipulations in the test set that have not been learned in the training set. This is also in line with practical application scenarios, because there are many unlearned, combined image editing manipulations in reality. For perceptual distinct images, besides a portion of images unrelated with original images, the other portion are selected from the same category corresponding to each original image, which can increase the difficulty of the dataset and improve the generalizability of the trained CNNs. Compared with previously adopted datasets, our dataset is more conforming to the actual application scenario of PIH task. Our dataset contains 1200 original images, and each original image is subjected to 48 image content-preserving manipulations to generate 48 perceptual identical images. To balance the number of perceptual identical and distinct images, for each original image we also select 48 perceptual distinct images. 24 images of them are randomly selected, and the other 24 are semantically similar to the original images. Therefore, each batch contains one original image, 48 perceptual identical images, and 48 perceptual distinct images, totaling 97 images. Our dataset has 1200 original images, so there are 116,400 images in total, and the large amount of data also ensures the effective training of CNNs. Result To validate the performance of the proposed dataset (PIHD) in this paper, four convolutional neural networks were trained on five datasets, including PIHD, and tested among these datasets. The receiver operating characteristic (ROC) curves of each model is compared to judge its performance. Since the content-preserving manipulations used in this dataset are more complex and distinct images are more difficult to distinguish, convolutional neural networks trained on this dataset provide better image authentication performance. Even without retraining or fine-tuning, they can also obtain satisfactory image authentication performance on other datasets, which fully demonstrates the generalizability of the PIHD dataset. In addition, we also compare the area under curve (AUC) of each model on different test sets. The results demonstrate that the performance of the networks trained on other comparison datasets varies greatly across test sets, while the performance trained on PIHD is almost constant across datasets, reflecting the stability of the PIHD dataset. Collectively, the networks trained on our dataset are stable and have certain generalization ability, which can cope with complex and diverse real-world editing operations. Conclusion In this paper, we design a dataset for perceptual image hashing task, which uses a richer image content-preserving manipulations and has a certain randomness to restore the real application scenario to the maximum extent. In addition, images with the same semantic meaning as the original images are added to the distinct images in the dataset, which increases the difficulty in compliance with the perceptual image hashing task. This enables the trained CNNs to cope with more realistic and complex practical application scenarios. We have tested the dataset with different models on different datasets including our proposed dataset, and a large number of experiments demonstrate the effectiveness, generalizability and stability of this dataset. This dataset can promote the development of PIH field.