融合注意力机制与知识蒸馏的孪生网络压缩
Combining attention mechanism and knowledge distillation for Siamese network compression
- 2020年25卷第12期 页码:2563-2577
收稿:2020-02-18,
修回:2020-3-30,
录用:2020-4-6,
纸质出版:2020-12-16
DOI: 10.11834/jig.200051
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-02-18,
修回:2020-3-30,
录用:2020-4-6,
纸质出版:2020-12-16
移动端阅览
目的
2
使用深度孪生网络解决图像协同分割问题,显著提高了图像分割精度。然而,深度孪生网络需要巨大的计算量,使其应用受到限制。为此,提出一种融合二值化注意力机制与知识蒸馏的孪生网络压缩方法,旨在获取计算量小且分割精度高的孪生网络。
方法
2
首先提出一种二值化注意力机制,将其运用到孪生网络中,抽取大网络中的重要知识,再根据重要知识的维度重构原大网络,获取孪生小网络结构。然后基于一种改进的知识蒸馏方法将大网络中的知识迁移到小网络中,迁移过程中先后用大网络的中间层重要知识和真实标签分别指导小网络训练,以获取目标孪生小网络的权值。
结果
2
实验结果表明,本文方法可将原孪生网络的规模压缩为原来的1/3.3,显著减小网络计算量,且分割结果接近于现有协同分割方法的最好结果。在MLMR-COS数据集上,压缩后的小网络分割精度略高于大网络,平均Jaccard系数提升了0.07%;在Internet数据集上,小网络分割结果的平均Jaccard系数比传统图像分割方法的最好结果高5%,且达到现有深度协同分割方法的最好效果;对于图像相对复杂的iCoseg数据集,压缩后的小网络分割精度相比于传统图像分割方法和深度协同分割方法的最好效果仅略有下降。
结论
2
本文提出的孪生网络压缩方法显著减小了网络计算量和参数量,分割效果接近现有协同分割方法的最好结果。
Objective
2
Image co-segmentation refers to segmenting common objects from image groups that contain the same or similar objects (foregrounds). Deep neural networks are widely used in this task given their excellent segmentation results. The end-to-end Siamese network is one of the most effective networks for image co-segmentation. However
this network has huge computational costs
which greatly limit its applications. Therefore
network compression is required. Although various network compression methods have been presented in the literature
they are mainly designed for single-branch networks and do not consider the characteristics of a Siamese network. To this end
we propose a novel network compression method specifically for Siamese networks.
Method
2
The proposed method transfers the important knowledge of a large network to a compressed small network. This method involves three steps. First
we acquire the important knowledge of the large network. To fulfill such task
we develop a binary attention mechanism that is applied to each stage of the encode module of the Siamese network. This mechanism maintains the features of common objects and eliminates the features of non-common objects in two images. As a result
the response of each stage of the Siamese network is represented as a matrix with sparse channels. We map this sparse response matrix to a dense matrix with smaller channel dimensions through a 1×1 kernel size convolution layer. This dense matrix represents the important knowledge of the large network. Second
we build a small network structure. As described in the first step
the number of channels used to represent the knowledge in each stage of a large network can be reduced. Accordingly
the number of channels in each convolution and normalization layers included in each stage can also be reduced. Therefore
we reconstruct each stage of the large network according to the channel dimensions of the dense matrix obtained in the first step to determine the final small network structure. Third
we transfer the knowledge from the large network to the compressed small network. We propose a two-step knowledge distillation method to implement this step. First
the output of each stage/deconvolutional layer of the large network is used as the supervision information. We calculate the Euclidean distance between the middle-layer outputs of the large and small networks as our loss function to guide the training of the small network. This loss function is designed to make sure that the middle-layer outputs of the small and large networks are as similar as possible at the end of the first training stage. Second
we compute the dice loss between the network output and the real label to guide the final refining of the small network and to further improve the segmentation accuracy.
Result
2
We perform two groups of experiments on three datasets
namely MLMR-COS
Internet
and iCoseg. MLMR-COS has a large scale of images with pixel-wise ground truth. An ablation study is performed on this dataset to verify the rationality of the proposed method. Meanwhile
although Internet and iCoseg are commonly used datasets for co-segmentation
they are too small to be used as training sets for methods based on deep learning. Therefore
we train our network on a training set generated by Pascal VOC 2012 and MSRC before testing it on the Internet and iCoseg to verify its effectiveness. Experimental results show that the proposed method can reduce the size of the original Siamese network by 3.3 times thereby significantly reducing the required amount of computation. Moreover
compared with the existing co-segmentation methods based on deep learning
the proposed method can significantly reduce the amount of computation required in a compressed network. The segmentation accuracy of this compressed network on three datasets is close to the stat of the art. On the MLMR-COS dataset
this compressed small network obtains an average Jaccard index that is 0.07% higher than that of the original large network. Meanwhile
on the Internet and iCoseg datasets
we compare the compressed network with 12 traditional supervised/unsupervised image co-segmentation methods and 3 co-segmentation methods based on deep learning. On the Internet dataset
the compressed network has a Jaccard index that is 5% than the those of traditional image segmentation methods and existing co-segmentation methods based on deep learning. On the iCoseg dataset with relatively complex images
the segmentation accuracy of the compressed small network is slightly lower than those of the other methods.
Conclusion
2
We propose a network compression method by combining binary attention mechanism and knowledge distillation and apply it to a Siamese network for image co-segmentation. This network significantly reduces the amount of calculation and parameters in Siamese networks and is similar to the state-of-the-art methods in terms of co-segmentation performance.
Ba L J and Caruana R. 2014. Do deep nets really need to be deep?//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press: 2654-2662
Balasubramanian M and Schwartz E L. 2002. The isomap algorithm and topological stability. Science, 295(5552):#7[DOI: 10.1126/science.295.5552.7a]
Batra D, Kowdle A, Parikh D, Luo J B and Chen T. 2010. iCoseg: interactive co-segmentation with intelligent scribble guidance//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE: 3169-3176[ DOI: 10.1109/CVPR.2010.5540080 http://dx.doi.org/10.1109/CVPR.2010.5540080 ]
Chang H S and Wang Y C F. 2015. Optimizing the decomposition for multiple foreground cosegmentation. Computer Vision and Image Understanding, 141:18-27[DOI: 10.1016/j.cviu.2015.06.004]
Chen H, Huang Y F and Nakayama H. 2019. Semantic aware attention based deep object co-segmentation//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 435-450[ DOI: 10.1007/978-3-030-20870-7_27 http://dx.doi.org/10.1007/978-3-030-20870-7_27 ]
Chen W L, Wilson J T, Tyree S, Weinberger K and Chen Y X. 2015. Compressing neural networks with the hashing trick//Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR: 2285-2294
Chen X L, Shrivastava A and Gupta A. 2014. Enriching visual knowledge bases via object discovery and segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2027-2034[ DOI: 10.1109/CVPR.2014.261 http://dx.doi.org/10.1109/CVPR.2014.261 ]
Courbariaux M, Hubara I, Soudry D, El-Yaniv R and Bengio Y. 2016. Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1602.02830.pdf https://arxiv.org/pdf/1602.02830.pdf
Dettmers T. 2015. 8-bit approximations for parallelism in deep learning[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1511.04561.pdf https://arxiv.org/pdf/1511.04561.pdf
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303-338[DOI: 10.1007/s11263-009-0275-4]
Gong Y C, Liu L, Yang M and Bourdev L. 2014. Compressing deep convolutional networks using vector quantization[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1412.6115.pdf https://arxiv.org/pdf/1412.6115.pdf
Han S, Mao H Z and Dally W J. 2015. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1510.00149.pdf https://arxiv.org/pdf/1510.00149.pdf
Hati A, Chaudhuri S and Velmurugan R. 2016. Image co-segmentation using maximum common subgraph matching and region co-growing//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 736-752[ DOI: 10.1007/978-3-319-46466-4_44 http://dx.doi.org/10.1007/978-3-319-46466-4_44 ]
He X F and Niyogi P. 2003. Locality preserving projections//Proceedings of the 16th InternationalConference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 153-160
Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1503.02531.pdf https://arxiv.org/pdf/1503.02531.pdf
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M and Adam H. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1704.04861.pdf https://arxiv.org/pdf/1704.04861.pdf
Hsu K J, Lin Y Y and Chuang Y Y. 2018. Co-attention CNNs for unsupervised object co-segmentation//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: [s.n.]: 748-756.[ DOI: 10.24963/ijcai.2018/104 http://dx.doi.org/10.24963/ijcai.2018/104 ]
Hu H Y, Peng R, Tai Y W and Tang C K. 2016. Network trimming: a data-driven neuron pruning approach towards efficient deep architectures[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1607.03250.pdf https://arxiv.org/pdf/1607.03250.pdf
Huang Z H and Wang N Y. 2017. Like what you like: knowledge distill via neuron selectivity transfer[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1707.01219.pdf https://arxiv.org/pdf/1707.01219.pdf
Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J and Keutzer K. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1602.07360.pdf https://arxiv.org/pdf/1602.07360.pdf
Jaderberg M, Vedaldi A and Zisserman A. 2014. Speeding up convolutional neural networks with low rank expansions//Proceedings of British Machine Vision Conference. Nottingham, UK: BMVA Press: 1-7[ DOI: 10.5244/C.28.88 http://dx.doi.org/10.5244/C.28.88 ]
Jerripothula K R, Cai J F, Lu J B and Yuan J S. 2017. Object co-skeletonization with co-segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3881-3889[ DOI: 10.1109/CVPR.2017.413 http://dx.doi.org/10.1109/CVPR.2017.413 ]
Jerripothula K R, Cai J F and Yuan J S. 2016. Image co-segmentation via saliency co-fusion. IEEE Transactions on Multimedia, 18(9):1896-1909[DOI: 10.1109/tmm.2016.2576283]
Ji R R, Lin S H, Chao F, Wu Y J and Huang F Y. 2018. Deep neural network compression and acceleration:a review. Journal of Computer Research and Development, 55(9):1871-1888
纪荣嵘, 林绍辉, 晁飞, 吴永坚,黄飞跃. 2018.深度神经网络压缩与加速综述.计算机研究与发展, 55(9):1871-1888[DOI: 10.7544/issn1000-1239.2018.20180129]
Joulin A, Bach F and Ponce J. 2012. Multi-class cosegmentation//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 542-549[ DOI: 10.1109/CVPR.2012.6247719 http://dx.doi.org/10.1109/CVPR.2012.6247719 ]
LeCun Y, Denker J S and Solla S A. 1989. Optimal brain damage//Proceedings of the 2nd International Conference on Neural Information Processing Systems. Denver, USA: MIT Press: 598-605
Lebedev V, Ganin Y, Rakhuba M, Oseledets I and Lempitsky V. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1412.6553.pdf https://arxiv.org/pdf/1412.6553.pdf
Lee C, Jang W D, Sim J Y and Kim C S. 2015. Multiple random walkers and their application to image cosegmentation//Providence of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3837-3845[ DOI: 10.1109/CVPR.2015.7299008 http://dx.doi.org/10.1109/CVPR.2015.7299008 ]
Li H, Kadav A, Durdanovic I, Samet H and Graf H P. 2016. Pruning filters for efficient convnets[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1608.08710v3.pdf https://arxiv.org/pdf/1608.08710v3.pdf
Li W H, Jafari O H and Rother C. 2018. Deep object co-segmentation//Proceedings of the 14th Asian Computer Vision-ACCV 2018. Perth, Australia: Springer: 638-653[ DOI: 10.1007/978-3-030-20893-6_40 http://dx.doi.org/10.1007/978-3-030-20893-6_40 ]
Li Z F, Ni B B, Zhang W J, Yang X K and Gao W. 2017. Performance guaranteed network acceleration via high-order residual quantization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2584-2592[ DOI: 10.1109/ICCV.2017.282 http://dx.doi.org/10.1109/ICCV.2017.282 ]
Lin S H, Ji R R, Yan C Q, Zhang B C, Cao L J, Ye Q X, Huang F Y and Doermann D. 2019. Towards optimal structured CNN pruning via generative adversarial learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE: 2790-2799[ DOI: 10.1109/CVPR.2019.00290 http://dx.doi.org/10.1109/CVPR.2019.00290 ]
Luo J H and Wu J X. 2017. An entropy-based pruning method for CNN compression[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1706.05791.pdf https://arxiv.org/pdf/1706.05791.pdf
Mukherjee P, Lall B and Lattupally S. 2018. Object cosegmentation using deep Siamese network[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1803.02555.pdf https://arxiv.org/pdf/1803.02555.pdf
Quan R, Han J W, Zhang D W and Nie F P. 2016. Object co-segmentation via graph optimized-flexible manifold ranking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: 687-695[ DOI: 10.1109/cvpr.2016.81 http://dx.doi.org/10.1109/cvpr.2016.81 ]
Rastegari M, Ordonez V, Redmon J and Farhadi A. 2016. XNOR-net: ImageNet classification using binary convolutional neural networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 525-542[ DOI: 10.1007/978-3-319-46493-0_32 http://dx.doi.org/10.1007/978-3-319-46493-0_32 ]
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C and Bengio Y. 2014. FitNets: hints for thin deep nets[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1412.6550.pdf https://arxiv.org/pdf/1412.6550.pdf
Rother C, Minka T, Blake A and Kolmogorov V. 2006. Cosegmentation of image pairs by histogram matching- incorporating a global constraint into MRFs//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.New York: IEEE: 993-1000[ DOI: 10.1109/CVPR.2006.91 http://dx.doi.org/10.1109/CVPR.2006.91 ]
Roweis S T and Saul L K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326[DOI: 10.1126/science.290.5500.2323]
Rubinstein M, Joulin A, Kopf J and Liu C. 2013. Unsupervised joint object discovery and segmentation in internet images//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 1939-1946[ DOI: 10.1109/CVPR.2013.253 http://dx.doi.org/10.1109/CVPR.2013.253 ]
Shotton J, Winn J, Rother C and Criminisi A. 2006. TextonBoost : joint appearance, shape and context modeling for multi-class object recognition and segmentation//Proceedings of the 9th European Conference on Computer Vision. Graz: Springer: 1-15[ DOI: 10.1007/11744023_1 http://dx.doi.org/10.1007/11744023_1 ].
Shu C Y, Li P, Xie Y, Qu Y Y, Dai L Q and Ma L Z. 2019. Knowledge squeezed adversarial network compression[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1904.05100.pdf https://arxiv.org/pdf/1904.05100.pdf
Srinivas S and Babu R V. 2015. Data-free parameter pruning for deep neural networks//Proceedings of 2015 British Machine Vision Conference. Swansea: BMVA: 31.1-31.12[ DOI: 10.5244/c.29.31 http://dx.doi.org/10.5244/c.29.31 ]
Sun J and Ponce J. 2016. Learning dictionary of discriminative part detectors for image categorization and cosegmentation. International Journal of Computer Vision, 120(2):111-133[DOI: 10.1007/s11263-016-0899-0]
Tao Z Q, Liu H F, Fu H Z and Fu Y. 2017. Image cosegmentation via saliency-guided constrained clustering with cosine similarity//Proceedings of the 31 st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 4285-4291
van de Sande K E A, Uijlings J R R, Gevers T and Smeulders A W M. 2011. Segmentation as selective search for object recognition//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE: 1879-1886[ DOI: 10.1109/ICCV.2011.6126456 http://dx.doi.org/10.1109/ICCV.2011.6126456 ]
Wang C, Zhang H, Yang L, Cao X C and Xiong H K. 2017. Multiple semantic matching on augmented N -partite graph for object co-segmentation. IEEE Transactions on Image Processing, 26(12):5825-5839[DOI: 10.1109/TIP.2017.2750410]
Wold S, Esbensen K and Geladi P. 1987. Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1-3):37-52[DOI: 10.1016/0169-7439(87)80084-9]
Yu X Y, Liu T L, Wang X C and Tao D C. 2017. On compressing deep models by low rank and sparse decomposition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 7370-7379[ DOI: 10.1109/CVPR.2017.15 http://dx.doi.org/10.1109/CVPR.2017.15 ]
Yuan Z H, Lu T and Wu Y R. 2017. Deep-dense conditional random fields for object co-segmentation//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Macau, China: AAAI Press: 3371-3377[ DOI: 10.24963/ijcai.2017/471 http://dx.doi.org/10.24963/ijcai.2017/471 ]
Zagoruyko S and Komodakis N. 2016. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1612.03928.pdf https://arxiv.org/pdf/1612.03928.pdf
Zhang X Y, Zhou X Y, Lin M X and Sun J. 2018. ShuffleNet: an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6848-6856[ DOI: 10.1109/cvpr.2018.00716 http://dx.doi.org/10.1109/cvpr.2018.00716 ]
Zhang Z, Ning G H and He Z H. 2017. Knowledge projection for deep neural networks[EB/OL].[2020-01-15] . https://arxiv.org/pdf/1710.09505.pdf https://arxiv.org/pdf/1710.09505.pdf
相关作者
相关机构
京公网安备11010802024621