最新刊期

    28 3 2023

      Review

    • Ruoyu Zhao,Xi Ye,Wentao Zhou,Yushu Zhang,Xiuli Chai
      Vol. 28, Issue 3, Pages: 645-665(2023) DOI: 10.11834/jig.220533
      Cloud-stored image thumbnail-preserving encryption
      摘要:Image cloud storage platform has been developing intensively in recent years in relevant to its features of information sharing, cost efficiency, less local storage space and elastic scalability. It has led to the growth of uploading images to the cloud. However, cloud storage is risky for its convenience. It has caused a number of privacy concerns. For example, the Hollywood pornographic photo scandal broke out in iCloud in 2014 because Hollywood actresses uploaded nude photos directly to the cloud. The iCloud is hacked to obtain these images after that. Rich visual meaning original images are converted by classical image encryption schemes into visual meaningless images, as well as salt and pepper-resembled noises can alleviate privacy concerns. However, the visual usability of the image in the cloud is ignored since users cannot obtain any useful visual information via browsing images in the cloud, and users is forced to choose it in following between image privacy and usability. Meanwhile, users have to download and decrypted the encrypted image continuously until they find the required one. In some cases, users may have to download and decrypt all encrypted images stored in the cloud if they need one image only. It is equivalent to ignore most of the usability of the cloud and preserving only the storage function, which may be one of the reasons why image encryption is rare in the cloud. The balancing issue of privacy and usability has been challenging in the field of multimedia research currently. The emerging thumbnail-preserving encryption (TPE) can balance image privacy and usability in terms of preserving the original thumbnail in the encrypted image recently. The visual effect-preserved in the encrypted image is equivalent to the original thumbnail-preserved. That is, visual information is larger than the thumbnail, and visual information is smaller than thumbnail-preserved. Users can configure and browse visual information only to identify images for usability. First, all TPE schemes are oriented to segmenting the image into thumbnail blocks and the block size is determined by the user. The visual information larger than the thumbnail block is preserved in the encrypted image, but the visual information is smaller than the deleted thumbnail block. The larger of the size is, the privacy protection is better and the usability is worse. Conversely, the size is smaller, the privacy protection is worse and the usability is better. Hence, users can adjust the balance between privacy and usability by resizing as needed, i.e., balance tunability, which allows users with different privacy sensitivities to choose the balance point to satisfy themselves. In fact, privacy has the characteristics of complexity and subjectivity that vary from person to person and from time to time. As long as users think it is in privacy, it should be protected. Therefore, privacy protection needs to give users the right option to decide, that is, let them adjust the degree of protection. In other words, TPE can yield users to adjust the protection level by changing one parameter simply, i.e., block size. Then, the pixels in each thumbnail block are encrypted and the premise is focused on the sum of pixel values in the block before and after encryption preserves unchanged. The closer the sum of the pixel values of the encrypted block is concerned of the sum of the corresponding original block, the higher quality of the thumbnail is preserved in the encrypted image, i.e., the visual quality is better for the encrypted image. The aims and objects of TPE research is briefly introduced in this paper. Furthermore, we summarize and review TPE schemes quantitatively, and potential research directions and possible applicable scenarios are predicted. The existing TPE schemes are classified into two types in terms of the visual quality of encrypted and decrypted images: 1) ideal TPE and 2) approximate TPE. For the former one, the thumbnail of the encrypted image is identical to the thumbnail of the original one, and the decrypted image is exactly the same as the original one simultaneously. For the latter one, the thumbnail of the encrypted image is the approximate version of the thumbnail of the original one despite of similarity-perceptual between the both, some pixels have changed slightly although there may no perceptible difference between the decrypted image and the original image in visual effect. The existing TPE schemes are described in detail in the context of its key techniques, framework, and security. Meanwhile, the pros and cons and potential directions of each scheme are given in terms of technical mechanism. In addition, to demonstrate the features of the relevant schemes, the representative TPE schemes are fully compared and tested by experiments, such as visual quality evaluation, size expansion evaluation, face detection evaluation, and user experiment. All experiments are equipped with the same machine and its key parameters are i7-8700 CPU@ 3.2 GHz, 16 GB RAM, and Window 10, while most experiments are configured with MATLAB 2016b. Finally, the synthesized open problems of TPE schemes and the future directions can predict for improving the existing schemes. To sum up, the existing TPE schemes are reviewed literately. The summary of TPE is carried out systematically on the basis of research status and development trend-relevant analysis.  
      关键词:cloud storage security;image encryption;image privacy;usability preserving;visual usability   
      4
      |
      0
      |
      1
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716270 false
      发布时间:2024-05-07

      Image Processing

    • Lin Kuang,Menglei Zou,Wenying Wen
      Vol. 28, Issue 3, Pages: 666-679(2023) DOI: 10.11834/jig.220512
      Two dimensional compressed-sensing-relevant thumbnail-preserving encryption method
      摘要:ObjectiveCloud storage services are challenged for high overhead data transfer, data tampering and possible leakage of user privacy, such as leaking images to unauthorized third parties. It threatens the leakage of sensitive information like user's identity, health status and location.MethodWe develop a novel data privacy protection scheme in terms of the combination of 2D compressed sensing (2DCS) and thumbnail-preserving encryption (TPE), which can use deterministic binary block diagonal (DBBD) matrix as the measurement matrix to sample the original image. Binary block diagonal (DBBD) matrix is used as the measurement matrix to sample the original image, and the sampling result can preserve the structural similarity of the image and improve the visual experience through the compressed signal mapping to YUV type signal, the reconstruction quality of the luminance part of the signal, and the error pixel set and the saliency pixel set-generated pixel set. Since the compressed signal has visual significance and can be as a downsampled version of the original image, the two-dimensional discrete wavelet transform can be used to decompose the compressed signal into 4 parts, one of which is the low-frequency part, which can make most of the visual information of the sampled values are available, and the remaining three parts are the high-frequency part of the compressed signal, which contains the detailed information only. The issue of embedding the key pixel set into the high frequency part of the compressed signal can greatly remove redundant information while ensuring the image recovery quality. In order to achieve a good balance between privacy and usability, our scheme is focused on using TPE to enable legitimate users to preview images in the cloud using visual information from ciphertext images combined with a priori knowledge to identify, organize and manage the images. The illegal third parties without a priori knowledge cannot obtain the exact in plaintext through ciphertext images.ResultTo ensure the reconstruction quality of key parts of the image like edges and textures, our scheme can be used to preserve the morphology and essential features of the image while compressing the image to reduce the storage cost at the same time. The cipher image-related thumbnail image is similar to original thumbnail image, which ensures the security of the cipher image without sacrifying the availability of cloud services. The experimental results show that the visual effect of the reconstructed image is very similar to its original image, and the measured average signal-to-noise ratio (PSNR) and structural similarity (SSIM) are significantly improved, whereas the average PSNR is greater than 31.8 dB and the SSIM is greater than 0.97 at a compression rate of 0.25. Compared to the reconstructed images of key pixel set-excluded, the PSNR and SSIM are improved by about 4%6% and 1%3% each. In addition, our algorithm ability for images recovery that are equally better than the three compared CS methods in terms of PSNR and SSIM metrics, and there is no significant cliff effect in 2DCS-TPE at extreme compression ratios, and the PSNR and SSIM gains are relatively in consistent in terms of compression ratios.ConclusionTo get image reconstruction quality better, our 2DCS-TPE scheme has its potentials to gurantee the privacy, usability and legitimate user experience, and it can achieve high security and low computational complexity as well.  
      关键词:cloud storage;thumbnail-preserving encryption(TPE);compressed sensing(CS);privacy;availability   
      2
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716272 false
      发布时间:2024-05-07
    • Chang Liu,Hong Zheng,Tianyu Wang,Chengzhuo Zhou
      Vol. 28, Issue 3, Pages: 680-690(2023) DOI: 10.11834/jig.220022
      Functional patterns-relevant blind-deblurring method for anti-counterfeiting code images
      摘要:ObjectiveAnti-counterfeiting code can be as a sort of quick response (QR) code-special design. The functions of anti-counterfeiting and traceability of QR code are involved in beyond encoding and decoding. So, the quality of anti-counterfeiting code images is highly required for that. Actually, the obtained anti-counterfeiting code image is challenged to be blurred due to camera-derived noise, the relative shooting motion between camera and anti-counterfeiting code, and its errors-defocused. Generally, QR codes-relevant slight blur degradation does not have a great impact on the function-decoded of anti-counterfeiting codes in terms of its own error-modified ability, but the function of authenticity identification is still a challenging issue to be resolved via the restoration for blurred anti-counterfeiting code images. Most of current blind deblurring algorithms are aimed at natural images, which do not make full use of the features of anti-counterfeiting code. The restoration result is not effective and time cost is high as well. To resolve this problem, we develop a blind-deblurring method on the basis of anti-counterfeiting code's functional patterns.MethodFirst, the blurred anti-counterfeiting code image is converted into grayscale and interpolation-bilinear is used to coordinate its size to 512×512 pixels. The intensity and gradient priors of anti-counterfeiting code image are re-identified in terms of its binary features. Intensity-prior means the gray values of clear anti-counterfeiting code image are concentrated between 0 and 255, while anti-counterfeiting code image-blurred are scattered between 0 and 255. Gradient-prior is defined as the difference between adjacent image pixels, which has horizontal and vertical directions. The gradient distribution of clear anti-counterfeiting code image is amongst 0, 255 and -255, whereas gradient values of blurred image are scattered between -255 and 255. Then, the entire blurred anti-counterfeiting code image is divided into four blocks with the same size: 1) upper left, 2) lower left, 3) upper right, and 4) lower right. After that, the three image position detection patterns of upper left, lower left and upper right and the correction pattern of lower right are extracted respectively. There are two potentials of block processing as mentioned below. The first one is beneficial for the deblurring-regularized method, which is scale-related temporal optimization. The other one is focused on comparative analysis for estimating the blur kernel of whole image to make deblurring result better in terms of non-uniform blur-melted block processing. Finally, intensity and gradient-priors cost function is as the constraints, and the deblurring problem is decomposed into two subproblems. The clear images and blur kernels of four blocks are generated based on regularization method and numerical method.ResultFirst, we test 100 artificial blur-relevant anti-counterfeiting code images. The updated blur types consist of motion blur, defocus blur and the two blurs-coordinated. To evaluate the performance of those algorithms to be tested, peak signal to noise ratio (PSNR), structural similarity (SSIM) and the time cost are used as the indicators. The experimental result shows that our algorithm can deal with varied degrees of motion blur and defocus blur. Next, we test 50 blurred anti-counterfeiting code images collected by mobile phones. Natural image quality evaluator (NIQE) is used as the image quality evaluation index. Our average NIQE value is decreased by 3.02 and the time cost is optimized by 22.07 s compared to some popular algorithms, including blind image deblurring using patch wise minimal pixels regularization. Furthermore, the details of anti-counterfeiting pattern can be restored well.ConclusionTo guarantee certain deblurring effect and time efficiency, our easy-to-use blind-deblurring optimization of anti-counterfeiting code images is demonstrated.  
      关键词:anti-counterfeiting code;blind deblurring;functional pattern;intensity prior and gradient prior;regularization;image quality evaluation   
      2
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716274 false
      发布时间:2024-05-07
    • Jianxin Teng,Jiefeng He,Jinchun Yuan,Fengchuang Xing,Hanpin Wang
      Vol. 28, Issue 3, Pages: 691-701(2023) DOI: 10.11834/jig.220648
      Edge-enhanced ultra high definition video quality assessment
      摘要:ObjectiveThe 4 K (3840×2160 pixels) ultra high definition (UHD) video has been developing intensively in terms of emerging network and television technology. However, in respect of acquisition, compression, transmission and storage, the distortion-acquired issue is challenged due to the huge amount of UHD video data, rich edge and texture information, and high resolution. Our research is focused on an edge-enhanced UHD-VQA method because UHD-based video quality assessment (VQA) has become a crucial research domain in television broadcasting.MethodFirst, the input video frame is splitted to obtain 3 kinds of channels: 1) R, 2) G, and 3) B. Then, the edge detection operator is used to detect the edge information for each channel. The edge information of R, G and B channels is coordinated and the edge map of the video frame is obtained. To extract the spatial information of the video, human visual system (HVS) is targeted to develop its content-oriented. To extract the spatial information of the video frame further, each frame is input into the ImageNet-1K-trained ResNet-50. To reduce the dimension of features, a global pooling-derived feature maps are concatenated on 3 aspects as mentioned below: 1) the feature maps is extracted and processed via recurrent unit-gated, 2) the min pooling and softmin pooling are used to process the features output, and 3) it is obtained and the prediction score can be calculated in terms of a sum of the weighted value. To extract multiple features, the masking-edged, content-oriented, and memory-temporal network modules are designed. Finally, to obtain the quality features and its video quality score-calculated, the features are melted into the fully connected layer network for dimensionality reduction. Due to the high definition and rich of edge details of UHD video, it is more likely to cause severe distortion at the edge. So, our edge-enhanced method can be adapted to the quality assessment of UHD video specially. At the same time, due to the introduction of content-oriented and time-domain hysteresis features, our method has its potentials for the quality assessment of more outdoor-relevant videos.ResultExperiments are compared to 5 popular methods on 4 datasets. We optimize some values on the 4 aspects: 1) 3.9% SROCC (Spearman rank-order correlation coefficient) improvement and 3.9% PLCC (Pearson's linear correlation coefficient) improvement on KoNViD-1K, 2) 4.2% SROCC improvement and 2.2% PLCC improvement on DVL2021, 3) 10.0% SROCC improvement and 10.1% PLCC improvement on LIVE-Qualcomm, and 4) 0.6% SROCC improvement and 0.1% PLCC improvement on LSVQ. To demonstrate its generalization ability, a cross-dataset experiment is carried out as well. Furthermore, to optimize the effectiveness of edge information, we conduct an ablation study as well. Our illustrated network can be actually trained well to match the feature of edge masking without edge masking.ConclusionTo alleviate the edge-distorted, an edge-enhanced method is demonstrated to assess the quality of UHD video. At the same time, the content-oriented and time-domain hysteresis features are introduced to resolve the coordinated UHD-VQA problem. To detect edge information of video frames, the Canny operator is used and its configuration is sorted out. The training parameters are used to deal with the Heterogeneity problem in multiple video datasets. To verify the effectiveness of the proposed method, a large number of experiments are tested and compared to 4 popular video quality evaluation datasets (UHD included). The performance can be improved and reached to 10.0%, and the smallest performance is gained 0.1% as well. These experimental results show that edge information can optimize the performance of VQA methods greatly. The computational cost is optimized greatly since the optical flow method is not used. The future research direction can be predicted and concerned about more potential HVS features for the NR-VQA problem.  
      关键词:ultra high definition video;video quality assessment (VQA);convolutional neural network (CNN);human visual system (HVS);edge enhancement   
      2
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716269 false
      发布时间:2024-05-07

      Information Hiding

    • Yongjian Hu,Xiongbo Huang,Yufei Wang,Beibei Liu,Shuowei Liu
      Vol. 28, Issue 3, Pages: 702-715(2023) DOI: 10.11834/jig.220328
      Improving deep learning-based video steganalysis with motion vector differences
      摘要:ObjectiveThe subjects of video steganography and video steganalysis have been widely studied because video is an ideal cover media for achieving high embedding capacity. The booming deep learning technique has been recently introduced to the area of video steganalysis. A few video steganalysis deep neural networks were published to detect the secret embedding in motion vectors (MVs). However, the current deep neural networks (DNNs) for video steganalysis only report mediocre detection accuracies, compared to the traditional handcrafted feature-based steganalysis approaches. It is conjectured that the performance limitation is due to the inadequate information provided for the network. According to the principle of video encoding, we explore the impact of steganographic embedding on different encoding parameters. Our aim is to extend the detection space by searching for abnormalities in coding parameters raised from steganography, so that we construct multiple input channels to improve detection performance of steganalysis networks.MethodWe first analyze how the motion vector differences (MVDs) can be influenced by the secret embedding on motion vectors (MVs). It is shown that the histogram of MVDs can exhibit visible changes in bin height after the embedding process of MVs. The MVDs convey critical information for revealing MV alteration, so we propose to consider the MVDs as an extra sampling space of the videos steganalysis network in addition to the existing MV and prediction residual spaces. However, the MVDs are irregularly and sparsely distributed in individual frames and are therefore difficult to calibrate among consecutive frames. We deliberately design a method for constructing the input channels of MVD samples, which can be compatible with the existing network architecture. Specifically, two matrices are adopted to record the vertical and horizontal components of MVD. Since the prediction unit (PU) partition varies from frame to frame, we take the minimum 4×4 block as the basic sampling unit. The vertical and horizontal components of the MVD of each 4×4 block are recorded as one element in vertical MVD matrix and horizontal MVD matrix, respectively. For H.265/HEVC (high efficiency video coding) video format, there are some blocks that do not involve inter-frame prediction and thus have no MVs and MVDs. There are also some blocks that use inter-frame prediction but adopt the Merge and Skip modes instead, and therefore only have MVs but no MVDs. For these two types of blocks, the corresponding elements are set to zeros in the MVD matrices. The newly introduced MVD channels can work alone or together with other channels such as MVs and prediction residuals. By incorporating the MVD channels into current video steganalysis networks, we obtain the improved networks for various tasks, including the improved VSRNet (IVSRNet), selection-channel-aware improved VSRNet (SCA-IVSRNet) and quantitative improved VSRNet (Q-IVSRNet).ResultWe conduct extensive experiments against 5 target steganographic methods with varying resolutions, bit rates and embedding rates. All embedding and detection are operated on H.265/HEVC videos. Two of the classical target methods originally designed for H.264 videos are transplanted to H.265/HEVC videos. The rest three targets are recently published H.265/HEVC specific steganographic methods. We first evaluate the performance of the MVD-VSRNet that only uses the MVD and prediction residual channels without the MV channels. Increased accuracies are obtained from the MVD-VSRNet compared to the baseline network VSRNet that employs MV and prediction residual channels. The discriminating capability of MVDs for stego videos is thus verified. The IVSRNet, adopting the MV, prediction residual and MVD channels, achieves an even better result. We then evaluate the SCA-IVSRNet, which integrates the IVSRNet with an embedding probability channel. It is shown that the performance of the SCA-IVSRNet exceeds both the IVSRNet and the SCA-VSRNet. We conduct comparisons with several milestone handcrafted feature-based video steganalysis approaches for MV-based steganography, including the adding or subtracting one (AoSO), motion vector reversion-based (MVRB) and near-perfect estimation for local optimality (NPEFLO) algorithms. We also include the local optimality in candidate list (LOCL), the latest state-of-the-art (SOTA) steganalysis method that employs specific feature of H.265/HEVC standard. It is shown that the SCA-IVSRNet surpasses all the other methods against the two transplanted target steganography. As for the H.265/HEVC specific steganography, the SCA-IVSRNet loses marginally to the NPEFLO and LOCL methods by less than 2% but exceeds the rest methods by around 10%. Among the five targets, the most challenging one does not directly change the MV values. In this case, the SCA-IVSRNet reports accuracies around 67%, only 0.3% behind the first place LOCL. It is worth noting that the IVSRNet also reaches 63% in this case, verifying again the important role of the proposed MVD channels. Finally, we assess the performance of the Q-IVSRNet on quantitative steganalysis task. The mean absolute errors (MAEs) obtained with the Q-IVSRNet are consistently less than those with the Q-VSRNet, which can be attributed to the effectiveness of MVD channels.ConclusionIn this work we aim at improving the detection accuracy of convolutional neural network (CNN)-based steganalyzers for MV-based video steganography. We point out the current input spaces of MVs and prediction residuals do not convey adequate steganalytic information. To solve this problem, we propose to extend the detection space to MVDs. The newly introduced MVD channel is fully compatible with current CNN-based video steganalyzers, leading to several improved steganalysis networks. Extensive experiments are conducted to evaluate the effectiveness of adopting MVD channels. Results show that the improved detection networks not only surpass their precedent versions by a large margin, but also catch up or even exceed some popular handcrafted feature-based steganalyzers. This work has exhibited how to extend the detection space and handle highly unstructured data in the construction of input matrix for CNN-based video steganalysis, which paves a way of designing more effective deep learning networks for video steganalysis.  
      关键词:video steganalysis;deep learning;motion vector(MV);motion vector difference(MVD);detection space;sparse data;data sampling;input matrix construction   
      2
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716341 false
      发布时间:2024-05-07
    • Wenying Wen,Menglei Zou,Yuming Fang,Yushu Zhang,Yifan Zuo
      Vol. 28, Issue 3, Pages: 716-733(2023) DOI: 10.11834/jig.220371
      Large-capacity data hiding and authentication method in encryted domain for medical image
      摘要:Objective Telemedicine diagnosis has become one of heath seeking behaviours in big data era. To make a quick responsed accurate diagnosis, telemedicine diagnosis can used for the sharing of sensitive information to doctors or medical institutions in terms of a large amount of sensitive information of patients like patient identity information, medical profiles and medical-relevant images. The medical sensitive information security is required for transmiting. However, patients' sensitive information is vulnerable to be tampered or forged in the process of medical data sharing, which affects doctors' diagnosis of patients due to the threatens of the confidentiality, integrity and privacy of information. Therefore, telemedicine diagnosis is challenged to information preservation and the integrity of medical data. Method To resolve the problems mentioned above, we develop a cryptographic domain high-capacity information hiding scheme. To leak out about 3.75 bit/pixel of space in the encrypted domain of carrier images for embedding patient sensitive information, the scheme is coordinated semi-tensor product compressed sensing (STP-CS)-integrated to high-capacity secret data embedding method. A low-order Gaussian chaotic matrix is constructed to compress and sample medical images with a semi-tensor product, whereas the storage space of the measurement matrix can be reduced exponentially and the real-time reconstruction at the receiving end can be improved as well. Residual value prediction is uitilized to provide data embedding space, and the processes of data extraction and image restoration are splitted. Specifically, the medical image data is segmented into two categories: 1) sensitive data and 2) non-sensitive in terms of the prediction residual matrix. At the receiving end, medical image information is initially recovered via sensitive data-reconstructed, and the residual matrix-based medical images are then recovered to reconstruction capability-optimized image sharing scheme. The data is split at the receiving end, and partial data is used for STP-CS reconstruction to recover sensitive data for pre-reconstructed medical images. The data remaining is employed to extract confidential patient information and non-sensitive data, which allows that the pre-reconstructed medical images to be recovered. Considering that the attacker may tamper with the image data in the image data transmission, the data integrity authentication step is added to the image data, which can accurately complete the authentication under the interference of quantization noise and transmission noise. However, the tampered image data is still challenged to be authenticated. Result This scheme can achieve the recovery of STP-CS-reconstructed images to high-quality images with low-complexity reconstruction, and verify the integrity of carrier images and embedded secret information with high-efficiency authentication. STP-CS not only optimizes the storage space of the measurement matrix by multiple, but also improves the real-time performance of reconstruction. To verify the effectiveness of our scheme proposed, we select 4 MRI (megnetic resonace imaging images) targeted on different parts of the human body, and our scheme is then evaluated with 3 kind of popular image recovery algorithms on these images. The experimental results show that the peak signal-to-noise ratio (PSNR) of this algorithm is optimized further, and the performance of this algorithm is improved and valued by about 8~10 dB compared to the second method. To analyze the security of the proposed algorithm, we introduce 3 sort of measurement methods like 1) key space, 2) histogram, and 3) entropy. Experimental results demonstrate that the key space of this scheme is 1016×1016×1024×1024= 1080≈ 2240 (≫ 2100), which is sufficient to prevent brute force attacks. The histograms of the encrypted image and the image are flat and cosistent with non-embedded secret information, which verifies the proposed scheme is enough to prevent attackers from obtaining histogram-based valuable information. Additionally, the entropy of all encrypted images is close to the benched value of 8, and thus the proposed scheme has the ability to resist entropy cryptanalysis. To analyze its robustness, 1) cropping acctack, 2) JPEG (joint photographic experts group compression), and 3) noise interference are selected, and the experimental results show that the interference against both of cropping and noise attacks have good robustness there, and the bit error rate (BER) of secret information is less than 5%, which ensures the integrity of secret information. To validate the effectiveness of the algorithm, the experiments demonstrate the reconstruction quality of medical images at different scale of resolutions, and the experimental results show that higher resolution is beneficial to image reconstruction quality. The size of the half-tensor measurement matrix can be matched to the optimization of image accuracy, the change of resolution does not enforce the degradation of the reconstruction quality of the images, as well as the embedding of secret information is guranteed. Conclusion Our scheme can optimize data confidentiality and provide new features in relevant to step-by-step recovery, pixel compression, and high-capacity secret information embedding.  
      关键词:medical image;high-capacity reversible hiding;semi-tensor product compressed sensing (STP-CS);security authentication;progressive reconstruction   
      2
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716387 false
      发布时间:2024-05-07
    • Ke Wang,Shaowu Wu,Xiaolin Yin,Jingqiao Fu,Bing Chen,Wei Lu
      Vol. 28, Issue 3, Pages: 734-748(2023) DOI: 10.11834/jig.220343
      General variable-length code mapping relevant reversible data hiding in JPEG bitstream
      摘要:Objective Data hiding technique based secret data can be embedded into the cover images, and the certified receiver can extract it from the marked image. However, most of data hiding methods cannot restore the cover image after data extraction. Reversible data hiding (RDH) can be as a special sort of data hiding techniques to ensure the complete recovery of the cover image and the secret data. Nowadays, JPEG is the most popular image format, which is widely used in diverse photography devices and the internet. JPEG images are commonly-used as the cover images of RDH. For the RDH schemes in joint photographic experts group (JPEG) images, the performance evaluations are focused on embedding capacity, visual quality, and file size increment of the marked image. One category of RDH in JPEG images is based on the variable-length code (VLC) mapping, which can preserve the visual quality of the marked image unchanged. Therefore, only the optimizations for embedding capacity and file size increment are required. Most previous methods based on VLC mapping prefer to improve the embedding capacity while keeping the file size unchanged. Although both the visual quality and the file size of the marked image will not be changed, the embedding capacity is rather limited. To achieve sufficient embedding capacity in terms of slight file size increment, we develop a new RDH method on the basis of general VLC mapping, which provides larger embedding capacity with smaller file size increment. Method To embed secret data, VLC mapping-based RDH methods replace the used VLCs with the unused VLCs in the bitstream. To get larger embedding capacity and smaller file size increment, we propose the mechanism of intermediate VLC mapping. First, the default defined Huffman table (DHT) is used to encode the DCT coefficients during JPEG compression. The run/size value (RSV) with higher frequency may be assigned to the VLC with longer code lengths. So before data embedding, RSV reordering is conducted first, and VLCs with shorter code lengths are allocated to RSVs with higher frequencies. After RSV reordering, coding redundancy is reduced and the VLC frequency distribution can be used to measure the performance of VLC mapping relationships. Next, we analyze the effect of intermediate VLC mapping and direct VLC mapping on the file size of the marked image. The intermediate VLC mapping will perform better than direct VLC mapping in terms of keeping the file size when the frequency of the selected intermediate VLC is less than the number of bits equal to 1 in the secret data. The optimal intermediate VLC mapping model is then proposed. To construct a map from the used VLC to the unused VLC and minimize the file size increment, the VLC with the smallest frequency is selected as the intermediate VLC for each code length. Finally, our method is employed to construct the optimal intermediate VLC mapping for the given embedding capacity and the JPEG bitstream. For secret data extraction, the certified receiver can reconstruct the VLC mapping without auxiliary information. Result The experiments are carried out on the USC-SIPI database. The embedding capacity is greatly improved by 5 to 40 for multiple cover images compared to previous VLC mapping-related RDH methods. When 1.8×104 bits of secret data are embedded in the cover JPEG images with quality factor 90, our file size increment can be reduced by up to 42%. The experiments have verified the effectiveness of RSV reordering as well. The proposed mapping can reduce the file size increment while a large embedding capacity is provided. Conclusion A new strategy is implemented to construct VLC mapping, called optimal intermediate VLC mapping. To realize RDH in JPEG bitstream, our method can keep the cover image unchanged, provide larger embedding capacity, and optimize smaller file size increment further.  
      关键词:reversible data hiding(RDH);variable-length code(VLC);defined Huffman table(DHT);JPEG bitstream;file size increment;embedding capacity   
      3
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716418 false
      发布时间:2024-05-07
    • Jian Wang,Kejiang Chen,Weiming Zhang,Nenghai Yu
      Vol. 28, Issue 3, Pages: 749-759(2023) DOI: 10.11834/jig.220529
      Image processing network-inverted identifiable secure natural steganography
      摘要:ObjectiveNatural steganography is regarded as a cover-source switching based image steganography method. To enhance the steganographic security, its objective is focused on more steganographic image-related cover features. Natural steganography is originally designed for ISO (International Standardization Organization) sensitivity through adding noise to a low ISO image to yield a high ISO image feature, and modeling this noise signal to complete the message embedding. This approach is required for modeling the generation of ISO sensor noise, the development pipeline from raw sensor data stored in RAW format is commonly used like portable network graphics (PNG) or joint photographic experts group (JPEG) format images, which is very complex and not precise enough and the existing natural steganography approaches cannot be identified for safety inefficiently. To make the stylized images generated by steganography indistinguishable from other stylized images, some existing approaches are employed to explore steganography on the basis of image style transformation. However, it is challenged that the steganography-generated stylized image has the same distribution as the stylized image from another source, and none of them is as secure as traditional natural steganography. Actually, it is possible to achieve clarified security via using generated images as the cover image. Steganography is tackled for stronger invisibility than cryptography, but it has been difficult to achieve identifiable security, and most of methods are constrained of empirical security. Due to existing identifiable secure methods is required to obtain the distribution of cover datasets or the ability to sample from the cover distribution accurately, it is not feasible for traditional cover datasets. However, datasets-generated are easy to exact sampling because generative models random variables are required to be introduced to manipulate data generation. Therefore, to accomplish cover-source switching in latent space, and achieve identifiable secure natural steganography, the invertible image processing network is coordinated in terms of normalization flow.MethodFirst, the image is mapped to the latent space in reverse using invertible neural networks-based image processing method, and the distribution of the latent variables are used to determine the cover features. To avoid modeling the original image features like ISO sensitivity, latent variable is configured as cover-source to switch and it can optimize the complexity of steganography significantly. At the same time, the identifiable secure steganography is implemented in the invertible image processing. The source switching-after cover is exactly the same as the cover of the other source, instead of the traditional natural steganography method, which can only be approximated in maximize. Since most of the invertible neural networks-based image processing methods use normal distribution-oriented latent variables to participate in recovery, conditional probability distribution of the stego latent variables about uniformly distributed messages can be designed. To enable message embedding and extraction, the stego latent variables can meet the requirement of normal distribution and the messages-linked correlation is obtained at the same time. To obtain the stego latent variable that matches the target conditional probability distribution, the inverse transform sampling-based message-embedding can be done efficiently via simplified regular distribution sampling. By keeping the stego latent variables with the same normal distribution as the latent variables used in normal image processing, the images involved in recovering or generating have the same distribution as well.ResultExperiments are carried out in terms of image quality, steganographic capacity, message extraction accuracy, steganographic security, and its runtime. Using the same quality evaluation method with the original image processing network, there is almost no difference in peak signal-to-noise ratio and structural similarity between our Steganography and non-steganography images. To optimize steganographic capability, invertible de-noising and invertible rescaling network-based steganography methods are able to extract the message with about 99% accuracy while embedding 5.625 bits per pixel on the stego image. And, invertible de-colorization network-based steganography method achieves more than 99% extraction accuracy while embedding 0.67 bits of message per pixel. For steganographic security, we demonstrate the identifiable security of the proposed method, and neither of the two deep learning steganalysis networks used can distinguish the stego image from the cover image with better accuracy than random guesses. Furthermore, to reduce the time from the exponential growth of the latter to a constant time, our improved message mapping algorithm can be used to improve the efficiency compared to the rejection sampling approach.ConclusionAn identifiable secure natural steganographic system is facilitated based on invertible image processing network, and the experimental results show that our method has its potentials for steganographic-related capacity and security.  
      关键词:steganography;natural steganography;provably secure steganography;invertible neural network(INN);image processing   
      3
      |
      0
      |
      2
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716437 false
      发布时间:2024-05-07
    • Yong Xu,Zhihua Xia
      Vol. 28, Issue 3, Pages: 760-774(2023) DOI: 10.11834/jig.220541
      Sender-deniable image steganography
      摘要:ObjectiveSteganography is to hide secret messages into an irrelevant cover, generate a stego and transmit it in the public channel without arousing suspicion. As the antithesis, steganalysis is to identify whether the secret message is hidden in the data, which always brings security risks to steganography. As a result, adversaries can carry out coercive attacks on the sender or receiver during the covert communication: finding the sender or receiver and coercing him to submit the verified secret message. In order to resist coercive attacks and protect information security, the concept of deniable steganography has been proposed, and a general framework of the receiver-deniable steganography scheme (based on deep neural networks) has been designed. While the research on sender-deniable steganography still is in its infancy due to the difficulty of generating the same stego with a different secret message as the original one does. In this paper, sender-deniable steganography is considered extensively. First, the related works and development trends of deniable schemes and image steganography are introduced on the two aspects of attack and defense, including 1) coercive attack vs. sender-deniable schemes and 2) image steganography vs. steganalysis. Next, we clarify the coercive attack on the sender, the requirement of information-communicated submission, and the possibility of identical stego verification.MethodWe develop a framework for sender-deniable image steganography and two schemes are designed as well: the invertible neural networks based sender-deniable image hiding (Scheme 1) and the deniable encryption based sender-deniable image steganography (Scheme 2). The proposed schemes are identified that the sender can use fake secret messages to generate the identical stego image as the image in the hands of the adversary, which can be used to deceive the adversary, escape the coercive attack and protect the security of the real secret message. In Scheme 1, we reuse the invertible neural network twice for image concealing and revealing. The secret image is concealed into a cover image and generates a stego image, and this stego image is concealed into another cover image, generating a second stego image for covert communication. Once the adversary coerces the sender, the sender can reproduce the second stego image with the first stego image. Simultaneously, the adversary will be taken in by the first stego image and the secret image still remains private. In Scheme 2, we coordinate steganography with deniable encryption for a generic sender-deniable steganography scheme. For instance, the secret message is encrypted into a ciphertext by XOR (exclusive OR) operation with a real key. Reversely, a fake key can be constructed by the very ciphertext and another piece of different fake message. The sender is required to embed the ciphertext into the cover as usual. When the coercive attack happens, the sender has the choice to dishonestly open the ciphertext with the fake message and fake key. The adversary verifies the fake message and the real one is unknown to him.ResultThe experimental results show that the two schemes can achieve the deniability of the sender and maintain the visual quality of stego images in terms of peak signal-to-noise ratio (PSNR) (exceeds 37 dB), structural similarity (SSIM) (exceeds 0.9). And, the message extraction error rate remains zero in Scheme 2. Nonetheless, a malicious coercive attack-oriented sender-deniable steganography scheme has not been achieved yet. The limitations and challenges of the proposed two schemes are discussed on the basis of the secret forms, extraction accuracy, encryption efficiency, and the security against coercive attack during verification process.ConclusionThe proposed sender-deniable image steganography framework is capable for sender to deceive the adversary, ensure the security of secret against coercive attack, and the two constructed schemes basically maintain steganography performance and achieve the deniability. We predict that steganography and neural networks (e.g., repeatable data hiding, equivariant convolutional networks) are potential to feasible constructions in the future.  
      关键词:information hiding;steganography;covert communication;coercive attack;deniable encryption;invertible neural network   
      3
      |
      0
      |
      1
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716464 false
      发布时间:2024-05-07

      Image Forensics

    • Yulin Zhang,Hongxia Wang,Rui Zhang,Jingyuan Zhang
      Vol. 28, Issue 3, Pages: 775-788(2023) DOI: 10.11834/jig.220549
      Semantic consistency-relevant multitask splicing-tampered detection
      摘要:ObjectiveForensics-oriented digital faked images and its editing and modification software have been emerging nowadays. To fake and misinterpret semantics of the original image, forgery-spliced is a commonly-used method in terms of new instances modification to the original image. Conventional methods are mainly concerned about the statistical information and physical features of the image itself in terms of convolutional neural network based (CNN-based) anomaly detection of forged images like edge features and noise features. But, it is still challenged for its semantic inconsistencies. In addition, image-tampered detection is challenged for human-behavioral image post-processing like compression or image filters.MethodTo detect images-forged splicing, semantic segmentation and noise reconstruction are used for CNN and multi-resolution-based detection. Our network-proposed consists of 4 aspects as mentioned below: 1) RGB stream, 2) noise stream, 3) fusion module, and 4) multi-task module. The RGB stream is used to extract the boundary-tampered artifacts and its semantic information. To extract the noise features of the forged regions, a filter layer-based steganalysis is used because the RGB and noise information can offer multifaceted forgery detection. The semantic segmentation task is oriented to capture the semantic inconsistencies. The noise reconstruction task can yield the network to obtain a more diversified image noise distribution; and the forgery detection task is used to locate the tampered regions. Similar to recent multi-task networks-popular, a discrete loss function is used as well, and the sum of the loss functions for each task is regarded as the overall loss function of the network. To enhance the spatial co-occurrence of the two features further, the RGB and noise stream-derived fusion module can be used to fuse the features before the features are melted into the forgery detection task. Additionally, to obtain more complicated and accurate features, the multi-resolution pathway is implemented to the RGB streams, noise streams and feature fusion modules in the network. To enhance the network's ability, multi-resolution pathway is tailored to perceive semantic and precise location information, and it is beneficial to location-oriented forgery detection tasks.ResultThe comparative experiments are carried out based on 6 tamper detection networks of those are 1) manipulation tracing network(ManTra-Net), 2) coarse to refined network(C2Rnet), 3) multi-task wavelet corrected network(MWC-Net), 4) compression artifact tracing network(CAT-Net), 5) ringed residual U-Net(RRU-Net), and 6) high-resolution network(HRNet)-based baseline networks on Fantastic Reality and Spliced Dataset. Model training and testing are equipped with Intel Core i7-9700k CPU and NVIDIA GeForce RTX2080Ti GPU. During training, stochastic gradient descent with a momentum of 0.9 is used as the optimizer with an initial learning rate of 0.005 and an exponential decay. The F1 scores on Fantastic Reality and Spliced Dataset are 0.946 and 0.961 of each. For temporal comparison experiment, our optimization is effective for balancing computational cost and network ability. The commonly-regular compression is in relevant to JPEG, whereas the image filters are used to adjust its contrast pairs and brightness. Therefore, to meet its natural scenario requirement, we design robustness experiments on the Fantastic Reality dataset based on 4 sorts of human-behavioral image post-processing methods of JPEG compression, contrast, brightness and noise distortion adjustment.ConclusionTo detect forged regions effectively and accurately, a semantic consistency-relevant multi-task and multi-resolution tampering detection network is demonstrated. The multitask strategy is implemented to extract certain semantic features and detect forgery regions in terms of semantic inconsistencies in forged images, while the multi-resolution network enables the network to obtain more diversified image information. Furthermore, robustness-based experiments demonstrate that our network-robust has its potentials for JPEG-compressed image post-processing.  
      关键词:image tampering detection;semantic consistency;multi-task strategy;multi-resolution;high-resolution network(HRNet)   
      3
      |
      0
      |
      1
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716515 false
      发布时间:2024-05-07
    • Jianqiu Zhu,Yang Hua,Xiaoning Song,Xiaojun Wu,Zhenhua Feng
      Vol. 28, Issue 3, Pages: 789-803(2023) DOI: 10.11834/jig.220513
      Human facial texture and fore-background differences-relevant anti-spoofing detection
      摘要:ObjectiveHuman-relevant face recognition is often vulnerable for a spoofing attacks-involved system. In this scenario, face information-verified is projected for that. To optimize face recognition system, face anti-spoofing (FAS) has been concerned in recent years. It can provide a security barrier for face recognition systems in practice. Conventional FAS methods are still restricted by handcrafted features and shallow classifiers, and end-to-end trained deep neural networks. Specific attacks are dealt with only because they are sensitive and vulnerable to facial appearance variations in pose, expression, illumination, makeup, occlusion, etc.. In contrast, deep neural network (DNN) based face anti-spoofing algorithms have the capability in distinguishing real and spoofing faces in unconstrained scenarios. Nevertheless, the existing deep learning based FAS methods also have two problems to be challenged: 1) a universal convolution layer and computational complexity and 2) complex background distractions. Although the diversity of background is beneficial to improve the robustness of a trained deep neural network, the performance may degrade when the background contour and exposure degree become the key for face anti-spoofing. For example, a trained model may rely on background features severely and pay less attention to the facial area, resulting in poor generalization for unclear scenarios.MethodTo resolve the issues mentioned above, we develop a novel facial texture information-based face anti-spoofing model and its relevant fore-background difference analysis. The proposed method has two main modules: 1) facial texture analysis (FTA) and 2) fore-background difference analysis (FBDA). First, the FTA is used to extract rich facial information for FAS, and a mask is used to process an input image in terms of face region-activated only. Next, the ConvNeXt is used to extract facial features-masked. It is required to be coordinated with FBDA although FTA-extracted facial information can alleviate background differences-introduced interference effectively. The challenge is that FTA may overfit to the training data due to the lack of background information. Third, to reduce computational complexity of the convolution kernel and eliminate the redundancy information of the extracted features, the convolution kernel of a backbone network is redesigned in terms of the mechanism of edge detection in the FBDA module. Our convolution kernels proposed consists of 1) sobel horizontal (vertical) convolution kernels for detecting horizontal (vertical) edge information and 2) a convex convolution kernel for detecting face contours. To improve the efficiency of a deep network, the proposed kernels have less parameter compared to the universal convolution kernel. Furthermore, the network can capture more fine-grained information of an input image in terms of the proposed convolution kernels-cascaded. Finally, to highlight the potentials of these two modules, an attention fusion module is designed and used to balance the contribution of the extracted facial texture information for spoofing detection. In particular, to improve the reliability and robustness of the proposed method, a multiple scenario-oriented attention fusion module is developed. To validate the proposed method, three kinds of datasets of are used, including CASIA-MFSD dataset, Replay-Attack dataset, and OULU-NPU dataset. In our experiments, comparative analysis is carried out based on the same evaluation metrics of those are 1) equal error rate (EER), 2) half total error rate (HTER), and 3) average classification error rate (ACER).ResultOur optimization achieves 0.19% in terms of EER on the CASIA-MFSD dataset, 0.00% in EER and 0.00% in HTER on the Replay-Attack dataset, and 0.6%, 1.9%, 1.9±1.2%, and 3.7±1.1% in terms of ACER on the four subsets of the OULU-NPU dataset. To evaluate the performance of our method for unclear environments, the cross-dataset evaluations are employed on CASIA-MFSD and Replay-Attack as well. Trained on CASIA-MFSD, it achieves 17.1% in HTER on the Replay-Attack test set. Trained on Replay-Attack, it achieves 27.4% on the CASIA-MFSD test set in terms of HTER. The comparative analysis is demonstrated that our optimization has its lower error rates on several complex data sets and performs better for unclear environments.ConclusionA novel face anti-spoofing model is developed using the front/background difference information and facial texture information, which can effectively alleviate the difficulties posed by complex background, resulting in better robustness of a trained deep network. To harness a new strategy for practical applications of face spoofing detection, our algorithm is proposed. It can improve the generalization capability of the trained model under unclear scenarios in accordance with more attention weight of the extracted facial texture features. Limitation the accuracy of our method is still not optimal for the datasets for a single background.  
      关键词:face anti-spoofing detection(FAS);edge detection;texture feature;attention feature fusion;face recognition   
      3
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716639 false
      发布时间:2024-05-07
    • Ying Li,Shan Bian,Chuntao Wang,Wei Lu
      Vol. 28, Issue 3, Pages: 804-819(2023) DOI: 10.11834/jig.220519
      CNN and Transformer-coordinated deepfake detection
      摘要:ObjectiveThe research of deepfake detection methods has become one of the hot topics recently to counter deepfake videos. Its purpose is to identify fake videos synthesized by deep forgery technology on social networks, such as WeChat, Instagram and TikTok. Forged features are extracted on the basis of a convolutional neural network (CNN) and the final classification score is determined in terms of the features-forged classifier. When facing the deep forged video with low quality or high compression, these methods improve the detection performance by extracting deeper spatial domain information. However, the forged features left in the spatial domain decrease with the compression, and the local features tend to be similar, which degrades the performances severely. This also urges us to retain the frequency domain information of forged image artifacts as one of the clues of forensics, which contains less interference caused by JPEG compression. The CNN-based spatial domain feature extraction method can be conducted to capture facial artifacts via stacking convolution. But, its receptive field is limited, so it is better at modelling local information but ignores the relationship between global pixels. Transformer has its potentials at long-term dependency modelling in relevant to natural language processing and computer vision tasks, therefore it is usually employed to model the relationship between pixels of images and make up for the CNN-based deficiency in global information acquisition. However, the transformer can only process sequence information, making it still need the cooperation of convolutional neural network in computer vision tasks.MethodFirst, we develop a novel joint detection model, which can leverage the advantages of CNN and transformer, and enriches the feature representation via frequency domain-related information. The EfficientNet-b0 is as the feature extractor. To optimize more forensics features, in the spatial feature extraction stage, the attention module is embedded in the shallow layer and the deep features are multiplied with the activation map obtained by the attention module. In the frequency domain feature extraction stage, to better learn the frequency domain features, we utilize the discrete cosine transform as the frequency domain transform means and an adaptive part is added to the frequency band decomposition. In the training process, to accelerate the memory-efficient training, we adopt the method of mixed precision training. Then, to construct the joint model, we link the feature extraction branches to a modified Transformer structure. The Transformer is used to model inter-region feature correlation using global self-attention feature encoding through an encoder structure. To further realize the information interaction between the dual-domain features, the cross attention is calculated between branches on the basis of the cross-attention structure. Furthermore, we design and implement a random data augmentation strategy, which is coordinated with the attention mechanism to improve the detection accuracy of the model in the scenarios of cross compression rate and cross dataset.ResultOur joint model is compared to 9 state-of-the-art deepfake detection methods on two datasets called FaceForensics++(FF++) and Celeb-DF. In the experiments of cross compression-rate detection on the FF++ dataset, our detection accuracy can be reached to 90.35%, 71.79% and 80.71% for Deepfakes, Face2Face and Neural Textures(NT) manipulated images, respectively. In the cross-dataset experiments, i.e., training on FaceForensics++ and testing on Celeb-DF, our training time is reduced.ConclusionThe experiments demonstrate that our joint model proposed can improve datasets-crossed and compression-rate acrossed detection accuracy. Our joint model takes advantage of the EfficientNet and the Transformer, and combines the characteristics of different domain features, attention, and data augmentation mechanism, making the model more accurate and efficient.  
      关键词:DeepFake detection;convolutional neural network(CNN);Vision Transformer(ViT);spatial domain;frequency domain   
      4
      |
      0
      |
      3
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716645 false
      发布时间:2024-05-07
    • Wenqi Zhuo,Dongze Li,Wei Wang,Jing Dong
      Vol. 28, Issue 3, Pages: 820-835(2023) DOI: 10.11834/jig.220559
      Data-free model compression for light-weight DeepFake detection
      摘要:ObjectiveDeep generative models-based human facial images and videos analyses have been developing in recent years. To cope with the faked issues effectively, a novel DeepFake detection (DFD) technique has emerged. Multiple DFD methods are yielded the detector to discriminate between the real and fake faces analysis with over 95% precision. However, it is still a great challenge to deploy them online because of the memory and computational cost. So, we develop a quantified model to DFD domain. Quantization-related model compression can be used to optimize model size through converting a model's key parameters from high precision floating points into low precision integers. However, the degradation issue is still being challenged. To resolve degradation problem, it can be segmented into 2 categories: 1) quantification-oriented fine-tuning and 2) post-training quantification. To optimize cost effective, the latter one is optioned to develop a light-weight DFD detector. In addition, to clarify the privacy concerns and information security, data-free scenario-oriented models-quantified are constructed and optimized with no prior training set.MethodThe proposed framework consists of 2 steps: 1) key parameters-related quantification and 2) activation-ranged calibration. First, the weights and activations of a well-trained high accuracy DFD model are optioned as the target parameters to be quantified. A linear transformation-asymmetric is used to convert them from 32-bit floating points into lower bit-width representation like INT8 and INT6. Next, the activation-ranged errors are validated based on calibration set. For data-free scenario, it is challenged to collect data from prior training set. Therefore, to produce more effective calibration data, a batch of normalization layers of a pre-trained DFD model is tailored to guide the generator. Such statistics knowledge is often used to reflect the distribution of training data like running-relevant means and variances. We can optimize the input data of those are sampled in random from a standard Gaussian distribution in terms of our L2-norm constraint. Furthermore, to reduce the accuracy loss, the ReLU6 are employed to optimize its activation function for all DFD models. The interval [0, 6] is introduced to ReLU6 as a natural range for activations, which is beneficial to the quantification. The data-coordinated is fed into the quantified model and the activation-ranged is calibrated during the inference-forward process.Resultour proposed scheme is tested with popular multi-DFD models of those are ResNet50, Xception, EfficientNet-b3 and MobileNetV2 in relevant to the deepfake datasets-popular like FaceForensics++ and Celeb-DF v2. For FaceForensics++, the Xception and MobileNetV2 achieve Acc scores of 93.98% and 92.25% in W8A8 quantitatively and optimized by 0.01% and 0.92% to benchmark. The detection accuracy of ResNet50 is reached to 92.56% in W6A8. The performance of EfficientNet-b3 is required to be resolved and calibrated further. For Celeb-DF v2, each MobileNetV2 precision gains in W8A8, W8A6 and W6A6 are improved by 0.07%, 0.77% and 0.09% of each compared to its benched model. For 3 sorts of DFD models-relevant, the detection accuracy of their quantified versions is higher than 92%, even in W6A6 quantization. In comparison with a similar work "DefakeHop", it can construct a DFD-featured light-weight network as well. For the quantified DFD models, they can get higher scores of AUC(area under the ROC curve) on public datasets although the parameter amount is unchanged and larger than DefakeHop. Actually, to make DFD models more light-weight, we can use the proposed scheme to compress DefakeHop. To evaluate our approach better, a series of ablation experiments are carried out to analyze the bit-width setting of weights and activation-derived quantification impact, the type of calibration data, and activation function as well.ConclusionThe model-compressed methods are melted into DFD tasks and a data-free post-quantization scheme is also developed. It can convert a pre-trained DFD model into light-weight. Experiments are implemented on FaceForensics++ and Celeb-DF v2 with a range of typical DFD models, including ResNet50, Xception, EfficientNet-b3 and MobileNetV2. The quantified DFD models can be customized to recognize fake faces accurately and efficiently. Future research direction is potential to assign the DFD models online or on some resource-constrained platforms like mobile and edge devices.  
      关键词:DeepFake detection;fake face;model compression;low bit-width representation;data-free distillation;light-weight model   
      3
      |
      0
      |
      1
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716648 false
      发布时间:2024-05-07

      Model Security

    • Yusheng Guo,Zhenxing Qian,Xinpeng Zhang,Hongfeng Chai
      Vol. 28, Issue 3, Pages: 836-849(2023) DOI: 10.11834/jig.220421
      Non-semantic information suppression relevant backdoor defense implementation
      摘要:ObjectiveThe emerging convolutional neural networks (CNNs) have shown its potentials in the context of computer science, electronic information, mathematics, and finance. However, the security issue is challenged for multiple domains. It is capable to use the neural network model to predict the samples with triggers as target labels in the inference stage through adding the samples with triggers to the data set and changing the labels of samples to target labels in the training process of supervised learning. Backdoor attacks have threaten the interests of model owners severely, especially in high value-added areas like financial security. To preserve backdoor attacks-derived neural network model, a series of defense strategies are implemented. However, conventional defense methods are often required for the prior knowledge of backdoor attack methods or neural network models in relevant to the type and size of the trigger, which is inconsistent and limits the application scenarios of defense methods. To resolve this problem, we develop a backdoor defense method based on input-modified image classification task, called information purification network (IPN). The process of the IPNcan eliminates the impact of the trigger-added samples.MethodTo alleviate a large amount of redundant information in image samples, we segment the image information into two categories: 1) classification task-oriented semantic information, and 2) classification task-inrelevant non-semantic information. To get the sample being predicted as the target label for interpretation, backdoor attack can enforce the model to pay attention to the non-semantic information of the sample during the model training process. To suppress the noise of trigger, our IPN is demonstrated as a CNN used for encoding and decoding the input samples, which aims to keep the image semantics unchanged via minimizing the non-semantic information in the original samples. The inputs to the IPN are as the clean samples, as well as the outputs are as the modified samples. For specific training, first, several clean classifiers are trained on the basis of multiple structures and training hyperparameters. Then, the IPN is optimized to make the difference between the modified sample and the original sample as large as possible on the premise of keeping the modified sample correctly predicted by the above classifier. The loss function consists of two aspects as mentioned below: 1) semantic information retention, and 2) non-semantic information suppression. To alleviate the difference between the sample and the original sample, the weight of the two parts of the loss function can be balanced. The process of IPN-related sample decoding can disrupt the structure of the trigger. Therefore, the sample will not be predicted as the target label even if the model is injected backdoor. In addition, due to the semantic information in the samples image is not weakened, trigger-involved samples can be used to predict the correct labels whether the model is injected into the backdoor or not.ResultAll experiments are performed on NVIDIA GeForce RTX 3090 graphics card. The execution environment is Python 3.8.5 with Pytorch version 1.9.1. The datasets are tested in relevant to CIFAR10, MNIST, and Image-Net10. The ImageNet10 dataset is constructed in terms of selecting 10 categories from the ImageNet dataset in random, which are composed of 12 831 images in total. We randomly selected 10 264 images as the training dataset, and the remaining 2 567 images as the test dataset. The architecture of the IPN is U-Net. To evaluate the defense performance of the proposed strategy in detail, a variety of different triggers are used to implement backdoor attacks. For MNIST datasets, the classification accuracy of the clean model for the initial clean sample is 99%. We use two different triggers to implement backdoor attacks as well. Each average classification accuracy of clean samples is 99%, and the success rates of backdoor attacks are 100%. After all samples are encoded and decoded by the IPN, the classification accuracy of clean samples is remained in consistent, while the success rate of backdoor attacks dropped to 10%, and the backdoor samples are predicted to be correctly labeled 98% as well. The experimental results are similar to MNIST for the other two datasets. While the classification accuracy of clean samples decreases slightly, the success rate of backdoor attacks is optimized about 10%, and the backdoor samples are correctly predicted with high accuracy. It should be mentioned that the intensity and size of the triggers can impact the defensive performance of the proposed strategy to a certain extent. The weight between the two parts of the loss function will affect the accuracy of clean samples. The weight of non-semantic information suppression loss is positive correlated to the difference of images and negative correlated to the classification accuracy of clean samples.ConclusionOur proposed strategy is not required any prior knowledge for triggers and the models to be protected. The classification accuracy of clean samples can keep unchanged, and the success rate of backdoor attack is equivalent to random guess, and the backdoor samples will be predicted as correct labels by classifiers, regardless of the problem of classifiers are injected into the backdoor. The training of the IPN is required on clean training data and the task of the protected model only. In the implementation of defense, the IPN can just be configured to predominate the protected model for input sample preprocessing. Multiple backdoor attacks are simulated on the three mentioned data sets. Experimental results show that our defense strategy is an optimized implementation for heterogeneity.  
      关键词:convolutional neural network(CNN);model security;image classification;neural network backdoor;backdoor defense   
      3
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716649 false
      发布时间:2024-05-07
    • Junjie Zhao,Jinwei Wang,Junfeng Wu
      Vol. 28, Issue 3, Pages: 850-863(2023) DOI: 10.11834/jig.220516
      Adversarial attack method identification model based on multi-factor compression error
      摘要:ObjectiveArtificial intelligence (AI) technique based deep neural networks (DNNs) have facilitated image classification and human-facial recognition intensively. However, recent studies have shown that DNNs is vulunerable to small changes for input images. However, DNN-misclassified is caused derived of injecting small adversarial noise into the input sample, such an artificially designed anomalous example, called an adversarial example. Recent detection of adversarial examples can be used to get higher accuracy. But, to determine the level of deep neural network security and implement targeted defense strategies, the classification of attack methods is required to be developed further. The adversarial examples mainly consist two categories: 1) white-box and 2) black-box attacks. A white-box attack is oriented for all information about the target neural network-prior. The attacker can obtain information about the gradient of the loss function of the example and query the output of the target neural network and other information. A black-box attack concerns that the attacker can query the input and output information of the target neural network only. The white-box attack method is mainly implemented by querying the gradient of the network. Black-box attacks are mainly divided into two approaches: 1) bounded query and 2) gradient estimation. It is still challenged for the adversarial attack method used by attackers although existing adversarial example detection schemes can distinguish adversarial examples from natural ones accurately. JPEG compression is a commonly used lossy compression method for images processing. Its compression and decompression process can be linked to errors in relevant to truncation, rounding, color space conversion, as well as quantization. To deal with the heterogeneity for compression, the quantization step uses different quantization tables and a large variation is produced in the magnitude of the error.MethodTo classify adversarial examples' generation methods, we develop a multi-factor error attention model. To classify examples from multiple attack methods, the JPEG errors are injected into a neural network. To achieve parallel extraction of JPEG errors on graphic processing unit (GPU), JPEG compression and decompression processes are simulated in terms of DNN components. Multiple error branches are employed to alleviate multiple attempts of quality factors. A multi-factor error attention mechanism is proposed, which can balance the multisample-differentiated weights of each quality factor error branch. The feature statistical layer is used to calculate the high-dimensional feature vectors of the feature map. An attention mechanism is added to the feature statistical layer, and a attention-based feature statistical layer is proposed. The attention mechanism is beneficial for the feature values to adaptively modify the ratio between them. The peak-convolutional layer-derived feature map output is fed to the attention-based feature statistical layer for each channel. To optimize an efficient model for classifying adversarial examples' generation methods, the output of the multi-factor error branches is fused and sent into convolutional layers, then input into the attention-related feature statistical layer.ResultWe develop 15 ImageNet image classification dataset-based sub-datasets in terms of 8 popular attack methods. The fast gradient sign method (FGSM) and basic iterative method (BIM)-generated adversarial examples are composed of 4 sub-datasets of perturbation coefficients of 2, 4, 6, and 8. The Bandits-based adversarial examples are organized by two sub-datasets of versions L2 and L. Each sub-dataset is involved of 10 000 training examples and 2 000 test examples. The overall dataset consists of 15 sub-datasets, the attack method recognition rate is above 91%. The accuracy of noise intensity detection is above 96% on the FGSM and BIM datasets. In the adversarial sample detection task, the detection accuracy reaches 96%. The experiments show that the multi-factor feature attention network can not only classify adversarial attack methods in high accuracy, but also has its potentials for noise intensity recognition and adversarial examples' detection tasks. The comparative analysis demonstrate that our model proposed is not significantly degraded from existing schemes for the adversarial example detection task.ConclusionA multi-factor error attention model is developed for adversarial example classification. Our initial is dominated to the JPEG errors-aided for adversarial sample detection. The proposed model can simplify the extraction of JPEG compression-decompression errors and puts them on the GPU for parallel extraction. The error branch attention mechanism can be used to balance the weights adaptively between the error branches. The attention-linked feature statistical layer enriches the feature types and balances them adaptively.  
      关键词:image processing;convolutional neural network(CNN);adversarial example;image classification;compression error   
      4
      |
      0
      |
      1
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716696 false
      发布时间:2024-05-07
    • Shuwen Zhu,Ge Luo,Ping Wei,Sheng Li,Xinpeng Zhang,Zhenxing Qian
      Vol. 28, Issue 3, Pages: 864-877(2023) DOI: 10.11834/jig.220550
      Image-imperceptible backdoor attacks
      摘要:ObjectiveBackdoor attack-oriented adversarial attacks can yield deep model-attacked to play well in regular, whereas behaves maliciously in terms of triggers-predefined hidden backdoors-activated. But, deep models are vulnerable against multiple adversarial attacks. The aim of backdoor attacks is oriented to penetrate the predesigned backdoor triggers into the portion of the training data (e.g., specific patterns like a square, noises, strips, or warpings). To guarantee attacking effectiveness, existing backdoor attacks are focused on assigning clean-label to the poisoned samples or hiding triggers in the poisoned data against human inspection. Nevertheless, it is still challenged to possess visual-supervised in-situ security features. To resolve this problem, we develop an imperceptible and effective backdoor attack method, which is imperceptible against human inspection, filters, and statistic detector.MethodTo generate poisoned samples, a smaller image as trigger can be embedded into image-profiling, which are mixed with clean samples as the final training data. Hiding the trigger naturally, the label-imperceptive poisoned sample is similar to the corresponding clean sample (image imperceptibility), and it can also defend the most advanced statistical analysis (statistic imperceptibility) methods. We develop a one-to-oneself attack paradigm of those class-sourced for poisoning is oriented on the target class itself. Differentiated from the previous attack paradigms (all-to-one and all-to-all), a portion of target class-derived images are selected as pre-poisoned samples. With the correct label corresponding to the target class, these images could be imperceptible in the presence of human inspection. However, the classical attack paradigms all-to-one and all-to-all are based on unmatched or error labels, and the target class cannot be sourced by itself. Human inspection-against input-label pairs-mislabeled (like bird-cat) might arouse definite suspicion, which can be used to reveal the attack. Following a filtering process, the rest of samples (most of them are clean) could invalidate the attack. We can launch a quick attack on pre-trained model in terms of same data-poisoned fine-tuning. Our accuracy-consistent backdoor attack illustrates that the imperceptibility can be originated from label, image, and statistic aspects.ResultTo verify the effectiveness and invisibility of proposed method, experiments are compared to 3 kind of popular methods on ImageNet, MNIST, and CIFAR-10 datasets. For one-to-oneself attack, it can confuse the high accuracies-poisoned model through poisoning a small proportion (7%) of original clean samples on ImageNet, MNIST, and CIFAR-10. Compared to the clean model on all three datasets, the backdoor is inactivated by the trigger when clean samples are tested. There is slight decrease of poisoned accuracy, which is less than 1%. It should be noted that the label of poisoned image is changed to the target label with some backdoor attack, mislabeled input-label pairs will be detected in practice easily. Hence, we did not modify the triggers-injected label of image, while every input-label pair in the training sets of some classical methods is correct-matched. For classical all-to-one attack, the proposed method could classify the same accuracy-based clean samples, and it have comparable attack success rates (more than 99%) when poisoned samples are tested. The trigger of BadNet beyond is invisible against human visual inspection. The trigger-embedded are imperceptible, and the poisoned image is natural and hard to be distinguished from the original clean image. We use learned perceptual image patch similarity(LPIPS), peak signal-to-noise ratio(PSNR), and structural similarity(SSIM) as the metrics for invisibility to quantify as well. Compared with the three methods, the mean distance between the poisoned images generated by ours and original images is almost zero with a near-zero LPIPS value. With the highest SSIM values as well on three datasets, our poisoned samples are more similar to their corresponding benign ones. Meanwhile, our attack achieves the highest PSNR values (more than 43 dB on ImageNet, MNIST, CIFAR-10). For MNIST, PSNR score can be optimized more and reached to 52.4 dB.ConclusionAn imperceptible backdoor attack is proposed, where the poisoned image have its label-validated invisible trigger. Hidden-data based triggers are embedded in images invisibly. The poisoned images are similar to original clean ones in this way as well. The user can be imperceptive during the whole process and could not be aware of the abnormality, while other attackers cannot utilize the trigger. And, a new attack paradigm, one-to-oneself attack, is designed for clean-label backdoor attack. Specifically, the original label can keep in consistency when trigger-selected is used for poisoning the images. Behind the success of the new attack paradigm, most defenses will be invalid, which are based on the assumption that samples-poisoned may have a changed label. Finally, our backdoor attack proposed has its potentials to imperceptibility in relevant to label, image and statistic-contexts.  
      关键词:backdoor attack;imperceptible trigger;attack paradigm;clean label;statistical imperceptibility   
      3
      |
      0
      |
      0
      <HTML>
      <Meta-XML>
      <引用本文> <批量引用> 55716817 false
      发布时间:2024-05-07
    0