Current Issue Cover

袁姮, 刘杰, 姜文涛, 刘万军(辽宁工程技术大学)

摘 要
目的 针对残差图像分类网络中通道信息交互不充分导致通道特征没有得到有效利用,同时残差结构使部分特征信息丢失的问题,本文提出特征重排列注意力机制的双池化残差分类网络(Double-Pooling Residual Classification Network of Feature Reordering Attention Mechanism,FDPRNet)。方法 FDPRNet以ResNet-34残差网络为基础,首先将第一层卷积核尺寸从7×7替换为3×3,扩大后续卷积层的感受野范围,增强网络非线性表达能力,同时删除最大池化层,提高局部细节捕捉能力;然后,提出特征重排列注意力(Feature Reordering Attention Mechanism,FRAM)模块,将特征图通道进行分组,对其进行组间和组内重排序,通过一维卷积提取各通道组合的特征并拼接,得到重排列特征的权重;最后,提出了双池化残差(Double-Pooling Residual,DPR)模块,该模块使用最大池化和平均池化并行操作特征图,并对池化后的特征图进行逐元素相加和卷积映射,以提取关键特征,减小特征图尺寸。结果 在CIFAR-100、CIFAR-10、SVHN数据集上与其他11种图像分类网络进行比较,相比于性能第2的模型RTSA Net-101(residual Net-101 with tensor-synthetic attention),准确率分别提高了1.16%、1.01%、0.98%,实验结果表明FDPRNet显著提升了分类准确率。结论 本文提出的FDPRNet具有增强图像通道内部信息交流和减少特征丢失的能力,不仅在分类精度上表现出较高水平,而且显著提升了模型的泛化能力。
Double-pooling residual classification network of feature reordering attention mechanism

Yuan Heng, Liu Jie, Jiang Wentao, Liu Wanjun(Liaoning Technical University)

Abstract: Objective Residual classification network is a deep convolutional neural network architecture that plays an important role and has a huge influence in the field of deep learning. It has become one of the commonly used network structures in various image classification tasks in the field of computer vision. To solve the problem of network degradation in deep networks, unlike the traditional method of simply stacking convolutional layers, residual networks innovatively introduce residual connections, which directly add input features to output features through skip connections, and pass the original features directly to subsequent network layers. It forms a shortcut path, thereby preserving and utilizing feature information better. Although residual classification network effectively solves the problems of gradient explosion and vanishing during deep network training, when the output dimension of the residual block does not match the input dimension, convolution maps are needed to make the dimension the same, which causes a large number of pixels on the channel matrix in the residual module to be skipped, resulting in the problem of feature information loss. In addition, there is a correlation between image channels, and a fixed order of channels may lead to feature bias, making it difficult to fully utilize information from other channels and limiting the model"s ability to express key features. In response to the above issues, this article proposes a Double Pooling Residual Classification Network of Feature Reordering Attention Mechanism (FDPRNet).Method FDPRNet is based on the ResNet-34 residual network. First, replace the kernel size of the first convolutional layer from to . This is because, for relatively small images, larger convolutional kernels can cause the receptive field to become larger, capturing too much useless contextual information. At the same time, the maximum pooling layer is removed to prevent the feature map from shrinking further, retaining more image information, avoiding information loss caused by pooling operations, and making it easier for subsequent network layers to extract features better; Then, a Feature Reordering Attention Module (FRAM) is proposed to group the feature map channels, perform inter-group and intra group reordering, so that adjacent channels are no longer connected, and group the intragroup channels in a sequence of equal differences with a step size of 1. This operation can not only disrupt the order of some original channels before and after, but also preserve the relationship between some channels before and after, introducing a certain degree of randomness, allowing the model to comprehensively consider the interaction between different channels and avoiding excessive dependence on specific channels. The features of each channel combination are extracted and spliced by one-dimensional convolution, and then using the sigmoid activation function to obtain the weights of the rearranged features, which are multiplied element-by-element with the input features to obtain the feature map of the feature rearranged attention mechanism; Finally, a Double Pooling Residual (DPR) module is proposed, which uses both maximum pooling and average pooling to perform parallel operations on feature maps. This module obtains both salient and typical features of the input images, enhancing the expressive power of features and helping the network capture important information better in the images, thereby improving model performance. Element-by-element summation and convolutional mapping on the after-pooling feature maps are performed to extract key features, reduce the size of the feature maps, and ensure that the channel matrices are capable of element-level summation operations in residual concatenation. Result In the CIFAR-100, CIFAR-10, SVHN, Flowers-102, and NWPU-RESISC45 datasets, compared with the original model ResNet-34, the accuracy of ResNet-34 with the addition of the Feature Reordering Attention Mechanism FRAM is improved by 1.66%, 0.19%, 0.13%, 4.28%, and 2.00% respectively; The accuracy of ResNet-34 with the addition of the Double Pooling Residual module DPR is improved by 1.7%, 0.26%, 0.12%, 3.18%, and 1.31% respectively. The accuracy of FDPRNet, which is the combination of FRAM and DPR modules, is improved by 2.07%, 0.3%, 0.17%, 8.31%, and 2.47%, respectively. Compared with four different attention mechanisms, SE (squeeze and excitation), ECA (efficient channel attention), CA (coordinate attention), and CBAM (convolutional block attention module), the accuracy of FRAM is improved by an average of 0.72%, 1.28%, and 1.46% in CIFAR-100, Flowers-102, and STL-10 datasets. In summary, whether on small or large, less categorized or more categorized datasets, both the FRAM and DPR modules contribute to the improvement of recognition accuracy in the ResNet-34 network. The combination of the two modules, FDPR, has the best effect on improving the recognition rate of the network, and compared to other image classification networks, it has a significant improvement in accuracy.Conclusion The FDPRNet proposed in this article can enhance the information exchange within the image channel and reduce feature loss. It not only shows a high level of classification accuracy but also effectively enhances the network"s feature learning ability and model generalization ability. The main contributions of this article are as follows: 1) The feature rearrangement attention module FRAM is proposed, which breaks the connections between the original channels and groups them according to certain rules. By learning the weights of channel combinations in different orders, the channels between different groups interact without losing the front and back connections between all channels, achieving information exchange and channel crossing within the feature map, enhancing the interaction between features, better capturing the correlation between contextual information and features, and improving the accuracy of model classification. 2) Propose a dual pooling residual module DPR, which replaces the skip connections in the original residual block with a DPR module, solving the problem of feature information loss caused by a large number of pixel points being skipped in the channel matrix during the skip connections in the residual module. Using dual pooling to obtain salient and typical features of input images, can not only enhance the expression ability of features but also help the network better capture important information in images and improve model classification performance. 3) The FDPRNet proposed in this article inserts two modules, FRAM and DPR, into the residual network to enhance network channel interaction and feature expression capabilities, enabling the network model to capture complex relationships and strong generalization ability. It has achieved high classification accuracy on some mainstream image classification datasets.