感知器残差网络和超限学习机融合的3D物体识别
3D object recognition combining perceptron residual network and extreme learning machine
- 2019年24卷第10期 页码:1738-1749
收稿:2018-12-28,
修回:2019-4-8,
录用:2019-4-15,
纸质出版:2019-10-16
DOI: 10.11834/jig.180680
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-12-28,
修回:2019-4-8,
录用:2019-4-15,
纸质出版:2019-10-16
移动端阅览
目的
2
随着3D扫描技术和虚拟现实技术的发展,真实物体的3D识别方法已经成为研究的热点之一。针对现有基于深度学习的方法训练时间长,识别效果不理想等问题,提出了一种结合感知器残差网络和超限学习机(ELM)的3D物体识别方法。
方法
2
以超限学习机的框架为基础,使用多层感知器残差网络学习3D物体的多视角投影特征,并利用提取的特征数据和已知的标签数据同时训练了ELM分类层、K最近邻(KNN)分类层和支持向量机(SVM)分类层识别3D物体。网络使用增加了多层感知器的卷积层替代传统的卷积层。卷积网络由改进的残差单元组成,包含多个卷积核个数恒定的并行残差通道,用于拟合不同数学形式的残差项函数。网络中半数卷积核参数和感知器参数以高斯分布随机产生,其余通过训练寻优得到。
结果
2
提出的方法在普林斯顿3D模型数据集上达到了94.18%的准确率,在2D的NORB数据集上达到了97.46%的准确率。该算法在两个国际标准数据集中均取得了当前最好的效果。同时,使用超限学习机框架使得本文算法的训练时间比基于深度学习的方法减少了3个数量级。
结论
2
本文提出了一种使用多视角图识别3D物体的方法,实验表明该方法比现有的ELM方法和深度学习等最新方法的识别率更高,抗干扰性更强,并且其调节参数少,收敛速度快。
Objective
2
With the development of 3D scanning technology and virtual reality technology
the 3D recognition method for actual objects has become a major research topic. It is also one of the most challenging tasks in understanding natural scenes. Recognizing objects by taking photos on a smartphone has been widely used because 2D images are relatively easy to acquire and process. Recent advances in real-time SLAM and laser scanning technology have contributed to the availability of 3D models of actual objects. There is a great need for the effective methods to process 3D models and further recognize the corresponding 3D objects or 3D scenes. Some studies have attempted to use image-based methods to obtain 3D features through deep convolutional neural networks
and they have high memory efficiency. Other studies have used point-set methods or volume-based methods. The input forms of these methods are closer to the structure of the actual objects
and accordingly
the networks become more complicated and require huge computing resources. These studies have made some progress. However
the accuracy and real-time performance of 3D object recognition must still be improved. To deal with this problem
this study proposes a new 3D object recognition model that combines the perceptron residual network and the extreme learning machine (ELM).
Method
2
This model uses the proposed multi-layer perceptron residual network to learn the multi-view projection features of 3D objects on the basis of the framework of the extreme learning machine. The network model also uses a multi-channel integrated classifier composed of extreme learning machine
K-nearest neighbor (KNN)
and support vector machine (SVM) to identify 3D objects. This study is not just stacking various classifiers
which has a huge risk of overfitting. After obtaining the prediction output vector of ELM
the difference
$$e$$
between the first and the second largest probability value is calculated. When the difference is small
it indicates that the two corresponding categories are both close to the real category
and the current classifier has a high probability of classification error. The other two classifiers are used for classification. Without loss of precision
we use the comparison of the difference
$$e$$
with the threshold
$$T$$
to avoid using multiple classifiers at a time. Unlike AdaBoost
in most cases
the network only uses one classifier. To increase the nonlinearity of the low-level network
we use a convolutional layer with a multi-layer perceptron instead of the traditional convolutional layer. The convolutional network consists of the proposed improved residual unit. This unit contains multiple parallel residual channels with a constant number of convolution kernels
which can be fitted to residual functions of different mathematical forms
wherein convolution kernel parameters of the same size are shared. Different from the traditional extreme learning machine
half of the convolution kernel parameters and perceptron parameters in the network are randomly generated by Gaussian distribution
and the remaining parameters are obtained through training optimization. The extracted feature data and the known label data are used to train the extreme learning machine classification layer
KNN classification layer
and SVM classification layer. A confidence threshold is set at the output layer to allow the network to select whether to use the KNN classifier and the SVM classifier
and the voting mechanism is used to select the output class of our network.
Result
2
The proposed method achieves 94.18% accuracy on the Princeton 3D model dataset and 97.46% accuracy on the NORB 2D image dataset. The Princeton 3D model dataset is the currently widely used benchmark dataset for validating 3D object recognition. The 3D model in this dataset contains common furniture
vehicles
musical instruments
and electronics. The NORB dataset is one of the most commonly used image datasets. Our method has achieved the best results in the two benchmark datasets. In the framework of extreme learning machine
the training time of the proposed algorithm is reduced by three orders of magnitude compared with the training time of other deep learning methods. This approach is suitable for practical applications. In addition
we verify the effects of different parameters on recognition performance
such as the number of projected views
the number of residual channels
and the confidence threshold T of the classification layer.
Conclusion
2
Experiments show that the proposed method has higher recognition accuracy and stronger anti-interference than existing ELM methods and deep learning methods. It has less adjustment parameters and faster convergence. The proposed network is suitable for 3D object recognition and common image recognition. This study explores a network that can deal with high-dimensional data with low complexity
and experiments demonstrate that the performance of this network is excellent.
Biasotti S, Giorgi D, Spagnuolo M, et al. Reeb graphs for shape analysis and applications[J]. Theoretical Computer Science, 2008, 392(1-3):5-22.[DOI:10.1016/j.tcs.2007.10.018]
Tevs A, Huang Q X, Wand M, et al. Relating shapes via geometric symmetries and regularities[J]. ACM Transactions on Graphics, 2014, 33(4):1-12.[DOI:10.1145/2601097.2601220]
Chen Z, Zhao R C. Geometric invariance and its applications to 3D object recognition[J]. Journal of Image and Graphics, 2003, 8(9):993-1000.
陈柘, 赵荣椿.几何不变性及其在3D物体识别中的应用[J].中国图象图形学报, 2003, 8(9):993-1000. [DOI:10.11834/jig.200309357]
Guo L, Wang J Y, Huang Y Y. On texture mapping for realistic 3D reconstruction[J]. Journal of Image and Graphics, 2007, 12(10):1881-1884.
郭玲, 王建宇, 黄炎焱.真实感3D重建中的纹理映射技术[J].中国图象图形学报, 2007, 12(10):1881-1884. [DOI:10.11834/jig.20071045]
Wu Z R, Song S R, Khosla A, et al. 3D shapenets: a deep representation for volumetric shapes[C]//Proceedings of 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 1912-1920.[ DOI: 10.1109/CVPR.2015.7298801 http://dx.doi.org/10.1109/CVPR.2015.7298801 ]
Leng B, Zhang X Y, Yao M, et al. A 3D model recognition mechanism based on deep Boltzmann machines[J]. Neurocomputing, 2015, 151:593-602.[DOI:10.1016/j.neucom.2014.06.084]
Arvind V, Costa A, Badgeley M, et al. Wide and deep volumetric residual networks for volumetric image classification[EB/OL].[2018-12-13] . https://arxiv.org/pdf/1710.01217.pdf https://arxiv.org/pdf/1710.01217.pdf .
Xie J W, Zheng Z L, Gao R Q, et al. Learning descriptor networks for 3D shape synthesis and analysis[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 8629-8638.[ DOI: 10.1109/CVPR.2018.00900 http://dx.doi.org/10.1109/CVPR.2018.00900 ]
Chen D Y, Tian X P, Shen Y T, et al. On visual similarity based 3D model retrieval[J]. Computer Graphics Forum, 2003, 22(3):223-232.[DOI:10.1111/1467-8659.00669]
Xie Z G, Xu K, Shan W, et al. Projective feature learning for 3D shapes with multi-view depth images[J]. Computer Graphics Forum, 2015, 34(7):1-11.[DOI:10.1111/cgf.12740]
Huang G B, Zhu Q Y, Siew C K. Extreme learning machine:theory and applications[J]. Neurocomputing, 2006, 70(1-3):489-501.[DOI:10.1016/j.neucom.2005.12.126]
Wang Y Q, Xie Z G, Xu K, et al. An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning[J]. Neurocomputing, 2016, 174:988-998.[DOI:10.1016/j.neucom.2015.10.035]
Luo M X, Zhang K. A hybrid approach combining extreme learning machine and sparse representation for image classification[J]. Engineering Applications of Artificial Intelligence, 2014, 27:228-235.[DOI:10.1016/j.engappai.2013.05.012]
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016: 770-778.[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Lin M, Chen Q, Yan S C. Network in network[EB/OL].[2018-12-13] . https://arxiv.org/pdf/1312.4400.pdf https://arxiv.org/pdf/1312.4400.pdf .
Huang G B, Zhou H M, Ding X J, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(2):513-529.[DOI:10.1109/TSMCB.2011.2168604]
Liu S K, Giles L, Ororbia A. Learning a hierarchical latent-variable model of 3D shapes[C]//Proceedings of 2018 International Conference on 3D Vision. Verona, Italy: IEEE, 2018: 542-551.[ DOI: 10.1109/3DV.2018.00068 http://dx.doi.org/10.1109/3DV.2018.00068 ]
Yang Y Q, Feng C, Shen Y R, et al. Foldingnet: point cloud auto-encoder via deep grid deformation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 336-342.[ DOI: 10.1109/CVPR.2018.00029 http://dx.doi.org/10.1109/CVPR.2018.00029 ]
Ma C, An W, Lei Y J, et al. BV-CNNs: binary volumetric convolutional networks for 3D object recognition[C]//Proceedings of the British Machine Vision Conference. London, UK: BMVC, 2017: 693-701.[ DOI: 10.5244/c.31.148 http://dx.doi.org/10.5244/c.31.148 ]
Minto L, Zanuttigh P, Pagnutti G. Deep learning for 3D shape classification based on volumetric density and surface approximation clues[C] //Proceedings of the International Conference on Computer Vision Theory and Applications. Funchal, Madeira, Portugal: VISAPP, 2018: 317-324.[ DOI: 10.5220/0006619103170324 http://dx.doi.org/10.5220/0006619103170324 ]
Simonovsky M, Komodakis N. Dynamic edge-conditioned filters in convolutional neural networks on graphs[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 29-38.[ DOI: 10.1109/CVPR.2017.11 http://dx.doi.org/10.1109/CVPR.2017.11 ]
Zaheer M, Kottur S, Ravanbakhsh S, et al. Deep sets[C]//Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017: 3391-3401.
Dominguez M, Dhamdhere R, Petkar A, et al. General-purpose deep point cloud feature extractor[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, NV, USA: IEEE, 2018: 1972-1981.[ DOI: 10.1109/WACV.2018.00218 http://dx.doi.org/10.1109/WACV.2018.00218 ]
Han Z Z, Shang M Y, Liu Y S, et al. View inter-prediction GAN: unsupervised representation learning for 3D shapes by learning global shape memories to support local view predictions[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence Conference. Hilton Hawaiian Village, Honolulu, Hawaii, USA: AAAI, 2019: 8376-8384.[ DOI: 10.1609/aaai.v33i01.33018376 http://dx.doi.org/10.1609/aaai.v33i01.33018376 ]
相关作者
相关机构
京公网安备11010802024621