目的 服装检索对于在线服装的推广和销售有着重要的作用。而目前的服装检索算法却无法准确地检索出非文本描述的服装。特别是对于跨场景的多标签服装图片，服装检索算法的准确率还有待提升。本文针对跨场景多标签服装图片的差异性较大，以及卷积神经网络输出特征维度过高的问题，提出了深度多标签解析和哈希的服装检索算法。方法 该方法首先在FCN的基础上加入条件随机场，对FCN的结果进行后处理，搭建了FCN粗分割加CRFs精分割的端到端的网络结构，实现了像素级别的语义识别。其次，针对跨场景服装检索的特点，我们调整了Clothing Co-Parsing(CCP)数据集,并构建了Consumer-to-shop数据集。针对检索过程中容易出现的语义漂移现象，使用多任务学习网络分别训练了衣物分类模型和衣物相似度模型。结果 我们首先在Consumer-to-shop数据集上进行了服装解析的对比实验，实验结果表明在添加了CRFs作为后处理之后，服装解析的效果有了明显的提升。然后我们与3种主流的检索算法进行了对比，结果显示我们的方法在使用哈希特征的条件下，也可以取得较好的检索效果。在top5正确率上比WTBI高出1.31%，比DARN高出0.21%。结论 针对服装检索的跨场景效果差、检索效率低的问题，本文提出了一种基于像素级别语义分割和哈希编码的快速多目标服装检索方法。与其他的检索方法对比，本文在多目标、多标签服装检索场景有一定的优势，并且在保持了一定检索效果的前提下，有效的降低了存储空间，提高了检索效率。
Clothes Retrieval by Deep Multi-Label Parsing and Hashing
原 尉峰,郭 佳明,苏 卓,罗 笑南,周 凡(School;of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006;National;Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou 510006;School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004)
Objective Clothing retrieval is a technology that combines clothing inspection, clothing classification and feature learning, which plays an important role in the clothing promotion and sales. Current clothing retrieval algorithms are mainly based on deep neural network. These algorithm firstly learns the high dimensional features of a clothing image through the network, and then compares the high dimensional features between different images to judge the clothing similarity. These clothing retrieval algorithms usually have the semantic gap problem. They couldn’t connect the clothing feature with the semantic information, such as color, texture, style and so on, which makes them lack of interpretability. Therefore, it is difficult for these algorithms to adapt another domain, and they usually fail in retrieval some clothing with new styles. Especially for the cross domain multi label clothing image, the accuracy of clothing retrieval algorithm still needs to be improved. In this paper, we propose a new clothes retrieval pipeline with deep multi-label parsing and hashing, aiming at increase the cross domain clothing retrieval accuracy and reduce the high dimensional output features of the deep neural network.Method According to the semantic expression of street shot photos, we introduce and improve the network structure of Fully Convolutional Network (FCN) to parse clothing in pixel level. To overcome the fragment label and noise problem, we employ the conditional random fields (CRFs) to the FCN as a post process. In addition, in order to solve the problems of semantic gap and dimension disaster in clothing retrieval, a new image retrieval algorithm based on multi task learning and hash is proposed. Based on the extracted image features, the hash algorithm is used to map the feature vectors of the high dimension to the low dimension of Hamming space while maintaining the similarity. Hence the dimension disaster problem in the clothing retrieval algorithm could be solved, and a real-time performance could be achieved. Moreover, we reorganized the Consumer-to-shop database in view of cross scene clothing retrieval. The database is organized in accordance with shops’ and consumers’ photo to ensure that the clothes under the same ID are similar. We also propose a clothing classification model, and integrated this model on the traditional clothing similarity model to overcome the semantic drift problem. In summary, the clothing retrieval model in this paper could be divided into two parts. The first part is a semantic segmentation network for street shot photos, which is used to identify the specific clothing target in the image. The second part is a hash model based on the multitask network, which could map the high dimensional network features to the low latitude hash space.Result We modify the clothing co-parsing (CCP) dataset and set up the consumer-to-shop dataset. On the modified dataset, we firstly conduct the clothing parsing experiment. We find that the FCN might drop the detail features of an image. And after several up-sampling operations, the segmentation results will appear blurred edges and color blocking effect. In order to overcome these shortcomings, CRFs is used in the method for subsequent correction. The experimental results show that after adding CRFs as post-processing, more areas are recognized as correct labels, and fine color blocks are replaced by smoother segmentation results, which are more easily accepted by human intuition. Then we compare our method with three mainstream retrieval algorithms, and the results show that our method could achieve better retrieval results under the condition of using hash features. The top-5 accuracy is 1.31% higher than the WTBI, and 0.21% higher than the DARN.Conclusion In order to increase the efficiency and accuracy of clothing retrieval algorithm, we propose the deep multi-label parsing and hashing retrieval network. In the clothing parsing task, the modify FCN-CRFs model shows the best subjective visual effects than other methods, and achieves a superior time performance. In the clothing retrieval task, an approximate nearest neighbor search technique is employed and a hashing algorithm is used to simplify high-dimensional features. At the same time, to solve the semantic drift phenomena during retrieval, the clothing classification model and clothing similarity model are trained by using multi-task learning network. Compared to other clothing retrieval methods, our method shows some advantages in multi-label clothing retrieval scenarios. Our method achieve the highest score in top-10 accuracy, and effectively reduces the storage space and improves the retrieval efficiency.