并行交叉的深度卷积神经网络模型
Parallel cross deep convolution neural networks model
- 2016年21卷第3期 页码:339-347
网络出版:2016-03-07,
纸质出版:2016
DOI: 10.11834/jig.20160308
移动端阅览

浏览全部资源
扫码关注微信
网络出版:2016-03-07,
纸质出版:2016
移动端阅览
图像分类与识别是计算机视觉领域的经典问题
是图像检索、目标识别及视频分析理解等技术的基础。目前
基于深度卷积神经网络(CNN)的模型已经在该领域取得了重大突破
其效果远远超过了传统的基于手工特征的模型。但很多深度模型神经元和参数规模巨大
训练困难。为此根据深度CNN模型和人眼视觉原理
提出并设计了一种深度并行交叉CNN模型(PCCNN模型)。 该模型在Alex-Net基础上
通过两条深度CNN数据变换流
提取两组深度CNN特征;在模型顶端
经过两次混合交叉
得到1024维的图像特征向量
最后使用Softmax回归对图像进行分类识别。 与同类模型相比
该模型所提取的特征更具判别力
具有更好的分类识别性能;在Caltech101上top1识别精度达到63%左右
比VGG16高出近5%
比GoogLeNet高出近10%;在Caltech256上top1识别精度达到46%以上
比VGG16高出近5%
比GoogLeNet高出2.6%。 PCCNN模型用于图像分类与识别效果显著
在中等规模的数据集上具有比同类其他模型更好的性能
在大规模数据集上其性能有待于进一步验证;该模型也为其他深度CNN模型的设计提供了一种新的思路
即在控制深度的同时
提取更多的特征信息
提高深度模型性能。
The classification and recognition of images play an important role in a number of applications
such as image retrieval
object detection
and video content analysis. Nowadays
a major breakthrough has been obtained based on deep convolution neural network (CNN) model
which has surpassed state-of-the-art methods for image classification and recognition
because the features extracted by CNN models are more discriminative and contain more semantic information than the traditional approaches. However
such CNN models as Alex-Net and ZFCNN-Net are extremely simple and incapable of extracting more information for representing images
while other models such as VGG16/VGG19 and GoogLeNet always have a huge number of neurons and parameters. In this work
a novel model named deep parallel cross CNN (PCCNN) is proposed
which can extract more effective information from images and has less neurons and parameters than other models. Inspired by the mechanism of human vision
which has two visual pathways and optic chiasma
the proposed PCCNN is designed based on the Alex-Net
which extracts two groups of CNN features in parallel through a couple of deep CNN data transform flows. Moreover
after the first fully connected layers in each stream
the information of two streams are fused together. The fused information is forwarded to the next two fully connected layers
and then the output information is fused again for more power representation features. Finally
for image classification
the Softmax regression function is employed with a 1024D image feature vector from the fusion of the two feature groups. Note that Alex-Net is used as the base model because of its simple architecture and its need to use fewer neurons. In the PCCNN model
the first stream is the original Alex-Net
and in the second stream
6 instead of 4 is used as the stride in the first convolutional layer. The larger stride in convolutional layer has worse performance if only a single stream is used because of the greater number of missing information. However
when the two streams are combined
the proposed model has better performance than all the other models. In addition
because a larger stride is used in the second stream
the feature maps are smaller
and the number of neurons and parameters are not greatly increased. Some popular public datasets
namely Caltech101
Caltech256
and Scene15
are selected to evaluate the performance of our model. At the same time
some state-of-the-art models are implemented with the same settings for comparison. Experimental results demonstrate that the proposed PCCNN model achieves better performance in terms of image classification than these models
indicating that the features extracted with the PCCNN model are more discriminative and have stronger presentation ability. On the Caltech101 dataset
the accuracy reaches approximately 63% on top1 with PCCNN model
exceeding that of the VGG16 model by about 5% and that of the GoogLeNet model by about 10%. Moreover
in terms of the Caltech256 dataset
our model also has better performance than the other models with accuracy of 46.4% on top1
surpassing those of the VGG16 and GoogLeNet models by 5% and 2.6%
respectively. However
our model has worse performance on Scene15 dataset than GoogLeNet
but still has higher accuracy than when only a single Alex-Net is used. The proposed PCCNN model has better performance than several state-of-the-art CNN models in terms of image classification and recognition
particularly on the medium-scale datasets
but on the small-scale dataset
the proposed model does not exhibit better performance. Hence
the model should be further tested on large-scale vision tasks
such as Imagenet or SUN dataset
which is the next work that the authors are planning to do. In fact
the PCCNN model is not only applicable to image classification and recognition
but it can also provide a novel thinking methodology for deep CNN model designing. In the deep CNN model
the deeper the architecture is
the more neurons and parameters exist
and the complexity also significantly increases. Thus
increase the width of the model can be increased to match the features and obtain better performance. Although this method also leads to an increase in the number of neurons and parameters
the rate of increase is slower than when more layers are added in the single model; furthermore
the model is more in line with the human visual physiological mechanism. Finally
the PCCNN model had great extendibility.
相关作者
相关机构
京公网安备11010802024621