目的 随着3D扫描技术和虚拟现实技术的发展，真实物体的3D识别方法已经成为研究的热点之一。本文提出了一种新型的结合感知器残差网络和超限学习机(ELM)的3D物体识别方法。方法 模型以超限学习机的框架为基础，使用提出的多层感知器残差网络学习3D物体的多视角投影特征，并利用超限学习机、KNN和SVM组成的多通道集成分类器识别3D物体。网络使用增加了多层感知器的卷积层替代传统的卷积层。卷积网络由改进的残差单元组成，它包含了多个卷积核个数恒定的并行残差通道，可以分别拟合不同数学形式的残差项函数。这种结构能增加低层残差网络对复杂3D模型数据的特征表达能力。网络中半数卷积核参数和感知器参数以高斯分布随机产生，其余通过训练寻优得到。本文使用卷积网络提取的特征数据和已知标签数据同时训练了ELM分类层、KNN分类层和SVM分类层。针对3D物体的特征多样化问题，预测时，输出层通过置信度阈值让网络选择是否使用KNN分类器和SVM分类器，利用投票机制选择输出类别。结果 提出的方法在普林斯顿3D模型数据集上达到了94.18%的准确率，在2D的NORB数据集上达到了97.46%的准确率。它在两个标准数据集中均取得了当前最好的效果。结论 实验表明提出的方法比现有的ELM方法和深度学习等最新方法的识别率更高，抗干扰性更强，并且其调节参数少，收敛速度快。
3D object recognition combining perceptron residual network and extreme learning machine
huangqiang,wangyongxiong(University of Shanghai for Science and Technology)
Objective With the development of 3D scanning technology and virtual reality technology, 3D recognition method of real objects has become one of the major research. It is also one of the most challenging tasks for understanding the natural scenes. Some studies have attempt to use image feature extraction methods to obtain 3D features through more complicated deep learning networks, and they have made some progress. However, the accuracy and real-time performance of 3D object recognition still needs to be improved. To deal with this problem, this paper proposes a new 3D object recognition model that combines the perceptron residual network and Extreme Learning Machine (ELM). Method Based on the framework of extreme learning machine, this model uses the proposed multi-layer perceptron residual network to learn the multi-view projection features of 3D objects, and uses the multi-channel integrated classifier composed of extreme learning machine, K-nearest neighbor and support vector machine to identify 3D objects. In order to increase the nonlinearity of the low-level network, the network uses a convolutional layer with a multi-layer perceptron instead of the traditional convolutional layer. The convolutional network consists of the proposed improved residual unit. This unit contains multiple parallel residual channels with a constant number of convolution kernels, which can be fitted to residual functions of different mathematical forms, where convolution kernel parameters of the same size are shared. This structure can increase the ability of low-level residual networks to express features of complex 3D model data. Different from the traditional extreme learning machine, half of the convolution kernel parameters and perceptron parameters in the network are randomly generated by Gaussian distribution, and the rest are obtained through training optimization. A mask layer is added after the convolutional layer to prevent non-maximum suppression. The mask image is a binary image after the background and irrelevant elements are removed from the original image. The pooling layers are mean pooling layers. In this paper, the extracted features data and the known labels data are used to train extreme learning machine classification layer, KNN classification layer and SVM classification layer. For the problem of too many features of multi-view projections for 3D object, when predicting, the ELM classification is performed first, and the difference between the predicted maximum probability value and the second largest probability value is calculated. If the difference is too small, it is considered that the prediction may be incorrect and the noise interference is large. The category corresponding to the second largest probability value may be the real value, and the classifier needs to be replaced for testing. A confidence threshold is set at the output layer to let the network choose whether to use the KNN classifier and the SVM classifier, and the voting mechanism is used to select the output class of our network. Result The proposed method achieves 94.18% accuracy on the Princeton 3D model dataset and 97.46% accuracy on the NORB 2D image dataset. The Princeton 3D model dataset is the currently widely used benchmark dataset for validating 3D object recognition. The 3D model in this dataset contains common furniture, vehicles, musical instruments, electronics, and so on. The NORB dataset is one of the most commonly used image datasets. Our method has achieved the best results in both two benchmark datasets. Conclusion Experiments show that the proposed method has higher recognition accuracy and stronger anti-interference than the existing ELM methods and deep learning methods. It has less adjustment parameters and faster convergence. The network proposed in this paper is not only suitable for 3D object recognition, but also for common image recognition. This paper explores a network that can deal with high-dimensional data with low complexity, and experiments show that the performance of this network is well.