赵雪梅,吴军,陈睿星(桂林电子科技大学电子工程与自动化学院, 桂林 541004)
现有卷积神经网络（convolutional neural network，CNN）利用卷积层和激活函数的叠加，构建复杂非线性函数拟合输入数据到输出标签的转换关系，这种端到端的学习方式严重影响了CNN特征图与先验知识的融合，导致其对训练样本数量和质量敏感，同时增加了CNN特征图可解释性难度。本文从深度学习建模方式角度出发，以遥感图像特征表达及其可解释性为切入点，搭建传统遥感图像先验知识与CNN的桥梁，分析阐述了黎曼流形特征空间（Riemannian manifold feature space，RMFS）对CNN可解释性、特征演化规律等方面的促进作用；提出融合CNN与RMFS构建RMFS-CNN遥感图像分类新框架，以RMFS为特征过渡平台，一方面利用其线性特征分布规律降低CNN对传统图像特征的学习难度，另一方面定义能够突显图像先验知识的表达范式，提高CNN对可解释性特征的学习能力，以达到利用RMFS对先验知识（特征）表达的优异性能提高CNN遥感图像分类特征利用效率的目的；以RMFS特征表达范式为基础定义控制CNN特征学习偏好的损失函数，进而发展具有良好特征解释性的CNN分类模型及可控的模型训练方法；最后指出构建RMFS-CNN分类框架的可行性及该框架对遥感图像分类和深度学习理论发展方面的理论贡献与应用价值。
RMFS-CNN: new deep learning framework for remote sensing image classification
Zhao Xuemei,Wu Jun,Chen Ruixing(School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China)
Traditional convolutional neural networks (CNNs) use convolutional layers and activation functions to achieve nonlinear transformation from input images to output labels. The end-to-end training method is convenient, but it seriously hinders the introduction of prior knowledge regarding remote sensing images, leading to high dependency on the quality and quantity of training samples. The trained parameters of CNNs are used to extract features from input images. However, these features cannot be interpreted. That is, the learning process and the learned features are uninterpretable, further increasing dependency on training samples. Restricted by an end-to-end training method, traditional CNNs can only learn general features from the training set, while the learned general features are difficult to transfer to another training set. At present, CNNs can be used on multiple tasks if the model is trained using a target training set. However, improving training accuracy on a finite training set is an extremely difficult task. Traditional CNNs cannot correlate the features contained in the input data and the requirements of certain applications. In addition, loss functions that can be used in certain applications are limited. Among which, some loss functions can only describe the difference between the predicted results and the corresponding labels. In such case, the network will sacrifice the disadvantaged classes to ensure global optimum, resulting in the loss of detailed information.CNNs construct a complex nonlinear function to transfer input images to output labels. The features learned by CNNs cannot be understood and are also difficult to be merged with other features in an explainable manner. By contrast, artificial features can reflect some aspects of information of an image, and the information contained in artificial features is meaningful, i.e., it can be used in most images. Artificial features can be considered prior knowledge that describes the empirical understanding of images. They cannot fully express the information contained in an image. Consequently, combining the advantages of CNNs and prior knowledge is efficient for learning essential features from images. Riemannian manifold feature space (RMFS) exhibits a powerful feature expression capability, through which the spectral and spatial features of an image can be unified. To benefit from CNNs and RMFS, this study analyzes the contribution of RMFS to the interpretability of CNNs and the corresponding evolution of image features from the perspective of CNN modeling and remote sensing image feature representation. Then, an RMFS-CNN classification framework is proposed to bridge the gap between CNNs and prior knowledge of remote sensing images. First, this study proposes using CNNs instead of traditional mathematical transformations to map the original remote sensing image onto points in RMFS. Mapping via CNNs can overcome the effects of neighboring sizes and modeling methods, improving the feature expression capability of RMFS. Second, the features learned via RMFS-CNN can be customized in RMFS to highlight specific information that can benefit certain applications. Furthermore, the customized features can also be used to design a rule-driven data perceptron on the basis of their interpretability and evolutions. Finally, new RMFS-CNN models based on the rule-driven data perceptron can be proposed. Considering the feature expression capability of RMFS, the proposed RMFS-CNN models will outperform traditional models in terms of learning capability and the stability of learned features. New loss functions, which can control the training process of RMFS-CNN models, can be developed by combining the customized features in RMFS. In general, the proposed RMFS-CNN framework can bridge the gap between remote sensing prior knowledge and CNN models. Its advantages are as follows. 1) Points in RMFS are interpretable due to the excellent feature expression capability of RMFS and the one-to-one correspondence between points in RMFS and pixels in the image domain. Therefore, RMFS can connect remote sensing prior knowledge and the learning capability of CNNs. The use of CNNs to learn specific information from remote sensing prior knowledge is efficient on the one hand, and it can ensure the stability of learned features on the other hand. Consequently, the dependency of CNNs on the quality and quantity of training samples can be reduced. 2) Points in RMFS contain the spectral features of corresponding pixels and spatial connections in the neighborhood system. Pixels representing the same object in the image domain are subject to a linear distribution when mapped onto RMFS. On the basis of these characteristics, RMFS can provide a platform for the interpretable features of remote sensing images. Under the premise of knowing the physical meaning and corresponding distribution of remote sensing images in RMFS, data-driven convolution can be converted into rule-driven data perceptron to improve the learning capability of RMFS-CNN models. The learning process and corresponding learned features can be interpreted using the rule-driven data perceptron. 3) RMFS exhibits another interesting distribution characteristic. Data points that represent the main body of an object construct a linear distribution, whereas data points that represent the edge of the object are randomly distributed in areas far from the linear distribution. This distribution characteristic enables RMFS to express different features of an object separately. Accordingly, features conducive to certain applications can be customized in RMFS and then abstracted by following the rule-driven data perceptron. With their feature customization capability, RMFS-CNN models can be refined in accordance to their input data and applications. 4) The RMFS-CNN framework can express the interpretable features of remote sensing images. These features can then be customized to adapt to the input data and the corresponding applications. The customized features contain useful information for a certain application, which can be used to define a constraint on the loss function to control the training process of RMFS-CNN models. Given that the constraint can force the network to learn features beneficial for the target application, two advantages are implemented: learning favorable features for a certain application can improve the training accuracy of a network on the one hand, and the interpretability of the learned features can be maintained on the other hand. Consequently, the trained network is easier to transfer compared with that of traditional CNNs.