目的 针对古代壁画图像自身特征提取存在的主观单一性和客观不充分性等问题，以经典AlexNet网络模型为基础，提出了一种结合特征融合思想的卷积神经网络模型用于古代壁画图像的自动分类。方法 首先，由于目前大型的壁画数据集较为缺乏，该模型对壁画样本进行缩放、亮度变换、加噪和翻转等图像增强算法来扩大数据集；然后，提取壁画图像第一阶段的边缘等底层特征；其次，采用结构不同的双通道网络对提取到的第一阶段特征进行第二阶段的深层抽象，得到两个通道的特征；最后，融合两个通道的特征，共同构建损失函数得到分类结果，从而提高模型的鲁棒性和特征表达能力。结果 实验结果表明，在所构造的壁画图像数据集上，该模型达到了84.24%的准确率，84.15%的召回率和84.13%的F1值。与AlexNet模型以及一些改进的卷积神经网络模型相比，各项评价指标均有大约5%的提高，由此验证了本文模型对于壁画图像自动分类的合理性和有效性。结论 本文所提出的壁画分类模型，综合考虑网络宽度和深度的影响，能从多局部的角度提取壁画图像丰富的细节特征，具有一定的优势和使用价值，可进一步结合到与壁画图像分类的相关模型中。
Objective Chinese ancient murals, as a kind of painting on the wall, have a long history of 4000 years and are an indispensable part of Chinese ancient paintings. With the increasing abundance of digital mural images, how to classify mural resources is becoming more and more urgent. The core of mural image classification is how to construct the feature description of an object. In addition to expressing the object adequately, this description should be able to distinguish the difference between different types of objects. However, the ancient mural images have certain pluralism and subjectivity due to artificial drawing. In view of the subjective singularity and objective insufficiency of traditional mural image feature extraction, we propose a convolutional neural network based on classical AlexNet network model and feature fusion idea for automatic classification of ancient mural images. Method Firstly, we define the optimizer as Adam with learning rate of 0.001 through experiments. On this basis, we extract each convolution layer feature of AlexNet for classification. Through the comparison of running time and accuracy, we select convolution layer which can better express mural features. Secondly, we combine the idea of feature fusion and exchange the two convolution kernels to form channel 1 and channel 2. The convolution kernels of channel 1 are 11, 5 and 3 respectively. And the convolution kernels of channel 2 are 11, 3 and 5 respectively. The combination of this method constitutes a two-channel convolution feature extraction layer, which makes the model make full use of multi-local features. Then, the over-fitting phenomenon caused by too many full-connection layers is considered. Based on the two-channel convolution feature extraction layer, we continue to compare the features of different full-connection layers, and select more appropriate full-connection layer features to express mural images. Finally, a mural image classification model with two-channel convolution layer and optimal full connection layer is presented. The mural image classification model proposed in this paper can be divided into three processes. 1) Mural image preprocessing. Due to the lack of large mural datasets, we use image enhancement operations such as zooming, brightness transformation, noise addition and flipping to enlarge the mural samples. An ancient mural image dataset including buddha, bodhisattva, buddhist disciples, secular figures, animals, plants, build and auspicious cloud is constructed. 2) The training stage of mural image classification model. The module has three stages. In the first stage, the model extracts the low level features such as the edges information of the trainset images. In the second stage, the two-channel network with different structures is used to abstract the features of the first stage. Then we get the features of the two channels. In the last stage, the loss function training network model is constructed by fusing the features of the two channels. The idea of feature fusion improves the robustness of the model and the ability of feature expression. 3) The training stage of mural image classification model. We use the network model with trained parameters to predict the classification results of test set samples. The classification accuracy, recall, f1-score is obtained. Result Through the comparison of running time and accuracy, the comparative experimental results of different convolution layers show that in AlexNet model, the third convolution layer is the most suitable network layer for this dataset. In addition, the accuracy rate will decrease if the number of layers is higher or lower than the number of layers in the paper. Similarly, the comparative experimental results of different full-connection layers show that the features of three-layer full-connection layer are more stable and sufficient on the basis of two-channel convolution extraction layer. Therefore, a 6-layer convolution neural network model including 3-layer dual-channel and 3-layer full connection layer is presented，in which there are 5 convolution layers in the two-channel. The model achieves 84.24% accuracy, 84.15% recall and 84.13% f1-score on the mural image data set constructed. Compared with AlexNet model and some improved convolution neural network models, the experimental results show that the accuracy of the model in most classes is the highest and each evaluation index of the model is improved by about 5%. These experimental data are sufficient to verify the validity of the model for automatic classification of mural images. Conclusion Considering the influence of network width and depth, ancient murals classification model with AlexNet model using feature fusion can fully express the rich details of mural images. This model has certain advantages and application value, and can be further integrated into the mural classification related model. However, this method is a shallow convolution neural network based on AlexNet, which fails to fully mine the high-level features of mural images. As a result, some images with similar low-level features such as color and texture cannot be classified correctly. What’s more, the running time of murals classification in this model is measured by hour, which consumes a lot of resources and is inefficient. Therefore, we will combine deeper models to express the high-level features of mural images in future work. At the same time, we will improve the efficiency of model training to make mural classification more effective and faster.