Current Issue Cover
端到端双通道特征重标定DenseNet图像分类

郭玉荣1, 张珂1, 王新胜1, 苑津莎1, 赵振兵1, 马占宇2(1.华北电力大学电子与通信工程系, 保定 071000;2.北京邮电大学信息与通信工程学院 人工智能研究院, 北京 100086)

摘 要
目的 针对密集连接卷积神经网络(DenseNet)没有充分考虑通道特征相关性以及层间特征相关性的缺点,本文结合软注意力机制提出了端到端双通道特征重标定密集连接卷积神经网络。方法 提出的网络同时实现了DenseNet网络的通道特征重标定与层间特征重标定。给出了DenseNet网络通道特征重标定与层间特征重标定方法;构建了端到端双通道特征重标定密集连接卷积神经网络,该网络每个卷积层的输出特征图经过两个通道分别完成通道特征重标定以及层间特征重标定,再进行两种重标定后特征图的融合。结果 为了验证本文方法在不同图像分类数据集上的有效性和适应性,在图像分类数据集CIFAR-10/100以及人脸年龄数据集MORPH、Adience上进行了实验,提高了图像分类准确率,并分析了模型的参数量、训练及测试时长,验证了本文方法的实用性。与DenseNet网络相比,40层及64层双通道特征重标定密集连接卷积神经网络DFR-DenseNet(dual feature reweight DenseNet),在CIFAR-10数据集上,参数量仅分别增加1.87%、1.23%,错误率分别降低了12%、9.11%,在CIFAR-100数据集上,错误率分别降低了5.56%、5.41%;与121层DFR-DenseNet网络相比,在MORPH数据集上,平均绝对误差(MAE)值降低了7.33%,在Adience数据集上,年龄组估计准确率提高了2%;与多级特征重标定密集连接卷积神经网络MFR-DenseNet(multiple feature reweight DenseNet)相比,DFR-DenseNet网络参数量减少了一半,测试耗时约缩短为MFR-DenseNet的61%。结论 实验结果表明本文端到端双通道特征重标定密集连接卷积神经网络能够增强网络的学习能力,提高图像分类的准确率,并对不同图像分类数据集具有一定的适应性、实用性。
关键词
Image classification method based on end-to-end dual feature reweight DenseNet

Guo Yurong1, Zhang Ke1, Wang Xinsheng1, Yuan Jinsha1, Zhao Zhenbing1, Ma Zhanyu2(1.The Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071000, China;2.School of Information and Communication Engineering, Institute of Artificial Intelligence, Beijing University of Posts and Telecommunication, Beijing 100086, China)

Abstract
Objective Image classification is one of the important research technologies in computer vision. The development of deep learning and convolutional neural networks (CNNs) has laid the technical foundation for image classification. In recent years, image classification methods based on deep CNN have become an important research topic. DenseNet is one of the widely applied deep CNNs in image classification, encouraging feature reusage and alleviating the vanishing gradient problem. However, this approach has obvious limitations. First, each layer simply combines the feature maps obtained from preceding layers by concatenating operation without considering the interdependencies between different channels. The network representation can be further improved by modeling feature channel correlation and realizing channel feature recalibration. Second, the correlation of the interlayer feature map is not explicitly modeled. Thus, adaptively learning the correlation coefficients by modeling the correlation of feature maps between the layers is important. Method The conventional DenseNet networks do not adequately consider the channel feature correlation and interlayer feature correlation. To address these limitations, multiple feature reweight DenseNet (MFR-DenseNet) combines channel feature reweight DenseNet (CFR-DenseNet) and inter-layer feature reweight DenseNet (ILFR-DenseNet) by ensemble learning method, thereby improving the representation power of the DenseNet by adaptively recalibrating the channel-wise feature responses and explicitly modeling the interdependencies between the features of different convolutional layers. However, MFR-DenseNet uses two independent parallel networks for image classification, which is not end-to-end training. The CFR-DenseNet and the ILFR-DenseNet models should be trained and saved in training. First, the models and weights are loaded, and the MFR-DenseNet needs multiple save and load. The training process is cumbersome. Second, the parameters and calculations are large, so the training takes a long time. In the test, the final prediction results of the MFR-DenseNet are obtained by taking an average of predictions from the two models. The parameters and test time are almost doubled compared with a single-channel feature reweight or interlayer feature reweight network. Therefore, the MFR-DenseNet has high requirements on the storage space and computing performance of the device in practical applications, thereby limiting its application. To address these limitations of MFR-DenseNet, this paper proposes an end-to-end dual feature reweight DenseNet (DFR-DenseNet) based on the soft attention mechanism. The network implements the channel feature reweight and interlayer feature reweight of DenseNet. First, the channel feature reweight and interlayer feature reweight method are integrated in DenseNet. By introducing a squeeze-and-excitation module (SEM) after each 3×3 convolutional layer, our method solves the problem of exploiting the channel dependencies. Each feature map of each layer in the SEM obtains a weight through a squeeze and excitation operation. The representation of the network can be improved by explicitly modeling the interdependencies between the channels. The output feature map of the convolutional layer is subjected to two squeeze excitation operations. Thus, the weight value of each layer can be obtained to achieve the reweight of the interlayer features. Then, DFR-DenseNet was constructed. The output feature map of each convolution layer completes the channel feature reweight and interlayer feature reweight through two channels. The concat and convolution operations were used to achieve the combination of two types of reweighted feature maps. Result First, the DFR-DenseNet is compared with the serial fusion method and parallel-addition fusion method on the image classification dataset CIFAR-10, which proves that DFR-DenseNet is the most effective. Second, to demonstrate the advantage of the DFR-DenseNet, we performed different experiments on the image classification dataset CIFAR-10/100. To show the effectiveness of the method on the high-resolution dataset, we conducted the age classification experiment on the face dataset MORPH, and the age group classification comparison experiment was performed on the unconstrained Adience dataset. The image classification accuracy was significantly improved. The 40-layer DFR-DenseNet had a 4.69% error and outperformed the 40-layer DenseNet by 12% on CIFAR-10 with only 1.87% more parameters. The 64-layer DFR-DenseNet resulted in a 4.29% error on CIFAR-10 and outperformed the 64-layer DenseNet by 9.11%. On CIFAR-100, the 40-layer DFR-DenseNet and 64-layer DFR-DenseNet resulted in a 24.29% and 21.86% test error on the test set, and they outperformed the 40-layer DenseNet and 64-layer DenseNet by 5.56% and 5.41%, respectively. Age estimation from a single face image is an essential task in the field of human-computer interaction and computer vision, which has a wide range of practical applications. Age estimation consists of two categories:age classification and age regression. Adience is used for age group classification and obtained 58.79% accuracy. MORPH Album 2 is used for age regression. The 121-layer DFR-DenseNet had a 3.16 mean absolute error and outperformed the 121-layer DenseNet by 7.33% on the MORPH Album 2. Compared with the MFR-DenseNet, the DFR-DenseNet reduced the number of parameters by half. The test time of the DFR-DenseNet network was shortened to approximately 61% in the MFR-DenseNet test. Conclusion The experimental results show that the end-to-end dual feature reweight DenseNet can enhance the learning ability of the network and improve the accuracy of image classification.
Keywords

订阅号|日报