结合深度多标签解析的哈希服装检索

原尉峰; 郭佳明; 苏卓; 罗笑南; 周凡

doi:10.11834/jig.180361

图像处理和编码 | 浏览量 : 0 下载量: 59 CSCD: 3

PDF
导出
分享
收藏
专辑

结合深度多标签解析的哈希服装检索
Clothing retrieval by deep multi-label parsing and Hashing
2019年24卷第2期页码：159-169
收稿：2018-04-17，

修回：2018-6-14，

纸质出版：2019-02-16
DOI： 10.11834/jig.180361
稿件说明：

移动端阅览

原尉峰, 郭佳明, 苏卓, 罗笑南, 周凡. 结合深度多标签解析的哈希服装检索[J]. 中国图象图形学报, 2019,24(2):159-169. DOI： 10.11834/jig.180361.

Weifeng Yuan, Jiaming Guo, Zhuo Su, Xiaonan Luo, Fan Zhou. Clothing retrieval by deep multi-label parsing and Hashing[J]. Journal of Image and Graphics, 2019, 24(2): 159-169. DOI： 10.11834/jig.180361.

摘要

目的

服装检索对于在线服装的推广和销售有着重要的作用。而目前的服装检索算法无法准确地检索出非文本描述的服装。特别是对于跨场景的多标签服装图片，服装检索算法的准确率还有待提升。本文针对跨场景多标签服装图片的差异性较大以及卷积神经网络输出特征维度过高的问题，提出了深度多标签解析和哈希的服装检索算法。

方法

该方法首先在FCN（fully convolutional network）的基础上加入条件随机场，对FCN的结果进行后处理，搭建了FCN粗分割加CRFs（conditional random fields）精分割的端到端的网络结构，实现了像素级别的语义识别。其次，针对跨场景服装检索的特点，我们调整了CCP（Clothing Co-Parsing）数据集，并构建了Consumer-to-Shop数据集。针对检索过程中容易出现的语义漂移现象，使用多任务学习网络分别训练了衣物分类模型和衣物相似度模型。

结果

我们首先在Consumer-to-Shop数据集上进行了服装解析的对比实验，实验结果表明在添加了CRFs作为后处理之后，服装解析的效果有了明显提升。然后与3种主流检索算法进行了对比，结果显示，本文方法在使用哈希特征的条件下，也可以取得较好的检索效果。在top-5正确率上比WTBI（where to buy it）高出1.31%，比DARN（dual attribute-aware ranking network）高出0.21%。

结论

针对服装检索的跨场景效果差、检索效率低的问题，本文提出了一种基于像素级别语义分割和哈希编码的快速多目标服装检索方法。与其他检索方法相比，本文在多目标、多标签服装检索场景有一定的优势，并且在保持了一定检索效果的前提下，有效地降低了存储空间，提高了检索效率。

Abstract

Objective

Clothing retrieval is a technology that combines clothing inspection

clothing classification

and feature learning

which plays an important role in clothing promotion and sales. Current clothing retrieval algorithms are mainly based on deep neural network. These algorithms initially learn the high-dimensional features of a clothing image through the network and compare the high-dimensional features between different images to determine clothing similarity. These clothing retrieval algorithms usually possess a semantic gap problem. They cannot connect the clothing features with semantic information

such as color

texture

and style

which results in their insufficient interpretability. Therefore

these algorithms cannot adapt another domain and usually fail in retrieving several clothing with new styles. The accuracy of clothing retrieval algorithm should be improved

especially for cross-domain multilabel clothing image. This study proposes a new cloth retrieval pipeline with deep multilabel parsing and hashing to increase the cross-domain clothing retrieval accuracy and reduce the high-dimensional output features of the deep neural network.

Method

On the basis of the semantic expression of street shot photos

we introduce and improve a fully convolutional network (FCN) structure to parse clothing in pixel level. To overcome the fragment label and noise problem

we employ conditional random fields (CRFs) to the FCN as a post process. In addition

a new image retrieval algorithm based on multi-task learning and Hashing is proposed to solve the semantic gap problem and dimension disaster in clothing retrieval. On the basis of extracted image features

a Hashing algorithm is used to map the high-dimensional feature vectors to low-dimensional Hamming space while maintaining their similarities. Hence

the dimension disaster problem in the clothing retrieval algorithm can be solved

and a real-time performance can be achieved. Moreover

we reorganize the Consumer-to-Shop database based on cross-scene clothing retrieval. The database is organized in accordance with shops and consumers' photos to ensure that the clothes under the same ID are similar. We also propose a clothing classification model and integrate this model on a traditional clothing similarity model to overcome the semantic drift problem. In summary

the proposed clothing retrieval model can be divided into two parts. The first part is a semantic segmentation network for street shot photos

which is used to identify the specific clothing target in the image. The second part is a Hashing model based on the multi-task network

which can map the high-dimensional network features to the low-latitude hash space.

Result

We modify the Clothing Co-Parsing dataset and establish the Consumer-to-Shop dataset. We conduct a clothing parsing experiment for the modified dataset. We find that the FCN might drop the detailed features of an image. The segmentation results show blurred edges and color blocking effect after several up-sampling operations. To overcome these limitations

CRFs are used in the method for subsequent correction. The experimental results show that many areas are recognized as correct labels

and fine color blocks are replaced by smooth segmentation results after the addition of CRFs as post-processing

which are easily recognized by human intuition. Then

we compare our method with three mainstream retrieval algorithms

and the results show that our method can achieve top-level accuracy with the usage of hash features. The top-5 accuracy is 1.31% higher than that of WTBI and 0.21% higher than that of DARN.

Conclusion

We propose a deep multilabel parsing and hashing retrieval network to increase the efficiency and accuracy of clothing retrieval algorithm. For the clothing parsing task

the modified FCN-CRFs model shows the best subjective visual effects among other methods and achieves a superior time performance. For the clothing retrieval task

an approximate nearest neighbor search technique is employed and a hashing algorithm is used to simplify high-dimensional features. At the same time

the clothing classification and clothing similarity models are trained by using a multi-task learning network to solve the semantic drift phenomena during retrieval. In comparison with other clothing retrieval methods

our method shows several advantages in multi-label clothing retrieval scenarios. Our method achieves the highest score in top-10 accuracy

effectively reduces storage space

and improves retrieval efficiency.

关键词

Keywords

references

Liu S, Liu L Q, Yan S C. Fashion analysis:current techniques and future directions[J]. IEEE MultiMedia, 2014, 21(2):72-79.[DOI:10.1109/MMUL.2014.25]

Chen H, Xu Z J, Liu Z Q, et al. Composite templates for cloth modeling and sketching[C]//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY, USA: IEEE, 2006: 943-950.[ DOI:10.1109/CVPR.2006.81 http://dx.doi.org/10.1109/CVPR.2006.81 ]

Yang M, Yu K. Real-time clothing recognition in surveillance videos[C]//Proceedings of 20111 8th IEEE International Conference on Image Processing. Brussels, Belgium: IEEE, 2011: 2937-2940.[ DOI:10.1109/ICIP.2011.6116276 http://dx.doi.org/10.1109/ICIP.2011.6116276 ]

Li Z M, Li Y T, Liu Y J, et al. Clothing retrieval combining hierarchical over-segmentation and cross-domain dictionary learning[J]. Journal of Image and Graphics, 2017, 22(3):358-365.

李宗明, 李妍特, 刘玉杰, 等.结合层次分割和跨域字典学习的服装检索[J].中国图象图形学报, 2017, 22(3):358-365. [DOI:10.11834/jig.20170310]

Yamaguchi K, Kiapour M H, Ortiz L E, et al. Parsing clothing in fashion photographs[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 3570-3577.[ DOI:10.1109/CVPR.2012.6248101 http://dx.doi.org/10.1109/CVPR.2012.6248101 ]

Liu S, Song Z, Liu G C, et al. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pa ttern Recognition. Providence, RI, USA: IEEE, 2012: 3330-3337.[ DOI:10.1109/CVPR.2012.6248071 http://dx.doi.org/10.1109/CVPR.2012.6248071 ]

Liang X D, Lin L, Yang W, et al. Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval[J]. IEEE Transactions on Multimedia, 2016, 18(6):1175-1186.[DOI:10.1109/TMM.2016.2542983]

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3431-3440.[ DOI:10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Chen F, Lv S H, Li J, et al. Multi-label image retrieval by hashing with object proposal[J]. Journal of Image and Graphics, 2017, 22(2):232-240.

陈飞, 吕绍和, 李军, 等.目标提取与哈希机制的多标签图像检索[J].中国图象图形学报, 2017, 22(2):232-240. [DOI:10.11834/jig.20170211]

Xu L, Ren J S J, Liu C, et al. Deep convolutional neural network for image deconvolution[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014: 1790-1798.

Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1529-1537.[ DOI:10.1109/ICCV.2015.179 http://dx.doi.org/10.1109/ICCV.2015.179 ]

Chen L C, Papandreou G, Kokkinos I, et al. Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.[DOI:10.1109/TPAMI.2017.2699184]

Chen Q, Huang J S, Feris R, et al. Deep domain adaptation for describing people based on fine-grained clothing attributes[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5315-5324.[ DOI:10.1109/CVPR.2015.7299169 http://dx.doi.org/10.1109/CVPR.2015.7299169 ]

Zhang Z P, Luo P, Loy C C, et al. Facial landmark detection by deep multi-task learning[C]//Proceedings of 2014 European Conference on Computer Vision. Cham: Springer, 2014: 94-108.[ DOI:10.1007/978-3-319-10599-4_7 http://dx.doi.org/10.1007/978-3-319-10599-4_7 ]

Freire-Obregón D, Castrillón-Santana M, Ramón-Balmaseda E, et al. Automatic clothes segmentation for soft biometrics[C]//Proceedings of 2014 IEEE International Conference on Image Processing. Paris, France: IEEE, 2014: 4972-4976.[ DOI:10.1109/ICIP.2014.7026007 http://dx.doi.org/10.1109/ICIP.2014.7026007 ]

Yang W, Luo P, Lin L. Clothing co-parsing by joint image segmentation and labeling[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 3182-3189.[ DOI:10.1109/CVPR.2014.407 http://dx.doi.org/10.1109/CVPR.2014.407 ]

Yamaguchi K, Kiapour M H, Ortiz L E, et al. Retrieving similar styles to parse clothing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(5):1028-1040.[DOI:10.1109/TPAMI.2014.2353624]

Cychnerski J, Brzeski A, Boguszewski A, et al. Clothes detection and classification using convolutional neural networks[C]//Proceedings of IEEE International Conference on Emerging Technologies and Factory Automation. IEEE, 2018: 1-8.[ DOI:10.1109/ETFA.2017.8247638 http://dx.doi.org/10.1109/ETFA.2017.8247638 ]

Lin K, Yang H F, Liu K H, et al. Rapid clothing retrieval via deep learning of binary codes and hierarchical search[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. Shanghai, China: ACM, 2015: 499-502.[ DOI:10.1145/2671188.2749318 http://dx.doi.org/10.1145/2671188.2749318 ]

Huang J S, Feris R, Chen Q, et al. Cross-domain image retrieval with a dual attribute-aware ranking network[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1062-1070.[ DOI:10.1109/ICCV.2015.127 http://dx.doi.org/10.1109/ICCV.2015.127 ]

Liu Z W, Luo P, Qiu S, et al. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 1096-1104.[ DOI:10.1109/CVPR.2016.124 http://dx.doi.org/10.1109/CVPR.2016.124 ]

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012: 1097-1105.

Xia R K, Pan Y, Lai H J, et al. Supervised hashing for image retrieval via image representation learning[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.[S.l.]: AAAI, 2014.

Kulis B, Grauman K. Kernelized locality-sensitive hashing forscalable image search[C]//Proceedings of 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 2130-2137.[ DOI:10.1109/ICCV.2009.5459466 http://dx.doi.org/10.1109/ICCV.2009.5459466 ]

Lai H J, Yan P, Shu X B, et al. Instance-aware hashing for multi-label image retrieval[J]. IEEE Transactions on Image Processing, 2016, 25(6):2469-2479.[DOI:10.1109/TIP.2016.2545300]

Lin K, Yang H F, Hsiao J H, et al. Deep learning of binary hash codes for fast image retrieval[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE, 2015: 27-35.[ DOI:10.1109/CVPRW.2015.7301269 http://dx.doi.org/10.1109/CVPRW.2015.7301269 ]

Liu H M, Wang R P, Shan S G, et al. Deep supervised hashing for fast image retrieval[C]//Proceedings of 2016 IEEE Conference on computer Vision and Pattern Recognition. Las Vegas, NV: IEEE, 2016: 2064-2072.[ DOI:10.1109/CVPR.2016.227 http://dx.doi.org/10.1109/CVPR.2016.227 ]

Lai H J, Pan Y, Liu Y, et al. Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 15040-3410.[ DOI:10.1109/CVPR.2015.7298947 http://dx.doi.org/10.1109/CVPR.2015.7298947 ]