发布时间: 2019-06-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180538
2019 | Volume 24 | Number 6

Chinagraph 2018

用于快速服装搭配的FMatchNet算法

刘玉杰¹, 冯士贺¹, 李宗民¹, 李华²

1. 中国石油大学计算机与通信工程学院, 青岛 266580;

2. 中国科学院计算技术研究所智能信息处理重点实验室, 北京 100190

收稿日期: 2018-09-11; 修回日期: 2019-01-13

基金项目: 国家自然科学基金项目（61379106，61379082，61227802）；山东省自然科学基金项目（ZR2013FM036，ZR2015FM011）

第一作者简介: 刘玉杰, 男, 副教授, 硕士生导师, 主要研究方向为计算机图形学与图像处理、多媒体数据分析、多媒体数据库、多媒体数据压缩。E-mail:liuyujie@upc.edu.cn;
冯士贺, 男, 硕士, 主要研究方向为计算机图形学与图像处理、机器学习。E-mail:714209099@qq.com;
李华, 男, 教授, 博士生导师, 主要研究方向为计算机图形图像处理。E-mail:lihua@ict.ac.cn.

中图法分类号: TP301.6

文献标识码: A

文章编号: 1006-8961(2019)06-0979-08

摘要

目的针对现有服装搭配系统中，提取服装图像深度特征进行搭配所需时间过长的问题，提出了一种新的FMatchNet网络提取哈希特征进行服装快速搭配的方法。方法首先采用快速区域卷积神经网络（Faster-RCNN）方法检测出图像中的服装，用此服装进行搭配可以最大限度地保留服装信息并消除背景信息的干扰。然后用深度卷积神经网络提取服装的深度特征并产生服装的哈希码，采用查询扩展的方法完成服装搭配。模型采用Siamese网络的训练方法使哈希码尽可能保留服装图像的语义信息。另外，由于目前国际上缺少大型时尚服装数据库，本文扩建了一个细粒度标注的时尚服装数据库。结果在FClothes数据库上验证本文方法并与目前流行的方法进行对比，本文方法在哈希长度为16时，上、下服装搭配方面的准确度达到了50.81%，搭配速度相对于基本准线算法提高了近3倍。结论针对大规模服装搭配问题，提出一种新的FMatchNet网络提取特征进行服装快速搭配的方法，提高了服装搭配的精度和速度，适用于日常服装搭配。

关键词

服装搭配; Siamese网络; 哈希; 查询扩展; Faster-RCNN

FMatchNet algorithm for fast clothing matching

Liu Yujie¹, Feng Shihe¹, Li Zongmin¹, Li Hua²

1. College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, China;

2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Supported by: National Natural Science Foundation of China (61379106, 61379082, 61227802); Natural Science Foundation of Shandong Province, China(ZR2013FM036, ZR2015FM011)

Abstract

Objective With the development of artificial intelligence and online shopping, clothing matching based on clothing images is crucial in helping merchants promote sales. An increasing number of young consumers are inclined to buy clothing online, but existing research mainly focuses on clothing search, clothing recommendations, and fashion trends. Quickly, accurately, and effectively matching the right clothing to the clothing that the user has already purchased remains a challenging task. With the development of the economy and the improvement of material level, the clothing style and number are increasing. Therefore, clothing matching among a large number of garments is vital. Aiming at the problem that the existing clothing matching framework is used in fashion clothing matching and the depth feature of clothing image extraction requires a large time overhead, this study proposes a new FMatchNet network for extracting hash features for a fast clothing matching. Method Deep learning is an important development in the field of machine learning and artificial intelligence. At present, deep convolutional neural networks have become one of the most effective means of extracting image features. The early extraction feature method is based on artificial extraction features, such as scale-invariant feature transform, speeded-up robust features, and histogram of oriented gradient. The features extracted by deep neural networks are more accurate than traditional features. By contrast, the use of binary Hashing codes for image features is an effective approach for reducing overhead and increasing computational speed. The core of the clothing mix is the description of the clothing image content. To match the clothing efficiently, the content of the clothing image must be described, and the basic idea is to express the clothing image as a feature vector. In general, the more closely matched the clothing images, the smaller the distance between their feature vectors. Recent studies in many image fields have begun to explore methods for generating Hashing codes on the basis of features extracted by deep networks. This research also applies this idea to study the clothing image representation method combined with deep learning and Hashing code. This study proposes a fast, accurate, and effective clothing matching network, namely, FMatchNet. The faster regional convolutional neural network (Faster-RCNN) method is adopted to detect the clothing area in the image. The clothing area can be used to maximize the original clothing information and eliminate the interference of image background information. Then, the image of the clothing area extracts the depth feature of the garment and the Hashing code of the garment through a two-way deep convolutional neural network. Finally, clothing matching is completed by using the query expansion method. The model applies the Siamese network training method to extract the depth features of the clothing image and the Hashing code extracted by this method, which can preserve the semantic information of the clothing image as much as possible. The Hashing code is used to select the candidate set of clothing matching, and then depth feature is used to rank the clothing matching in the selected clothing matching candidate set. In addition, given the lack of large-scale fashion clothing database in the world, a fine-grained fashion apparel database has been expanded in this paper. The expanded FClothes clothing dataset and dataset images mainly come from the Weibo website, which contains a large number of popular people and high-resolution clothing pictures that meet fashion demands. Finally, the algorithm is experimentally verified on the expanded fine-grained fashion clothing database. Result The method used in this study was verified on the expanded FClothes database and compared with current popular methods. This study compares the 8 bits, 16 bits, and 32 bits Hashing codes, and experimental results show that as the length of the Hashing code increases, the precision of the clothing mix increases. However, the time consumption also increases. When the lengths of the Hashing code are 16 and 32 bits, the matching accuracy of the upper and lower garments is higher than that of the baseline. When the length of the Hashing code is 16 bits, the matching accuracy of the method used in this paper is 50.81% in the upper and lower clothing, and the matching speed is nearly three times higher than that of the basic alignment algorithm. The basic accuracy of the comparison comes from the "Learning visual clothing style with heterogeneous dyadic co-occurrences" of the International Conference on Computer Vision 2015. From this point of view, the accuracy and time of the upper and lower clothing combinations of the algorithm are better than those of the current cutting-edge methods. Conclusion In view of the problem of large-scale clothing matching, this study proposes a new FMatchNet network extraction feature, which improves the precision and speed of clothing matching and is suitable for daily clothing matching.

Key words

clothing matching; Siamese network; Hashing; query extension(QE); Faster-RCNN

0 引言

随着人工智能和网络购物的不断发展，基于服装图像的服装搭配推荐具有帮助商家促进销售的重要意义。根据研究机构“中国互联网观察”(https://www.chinainternetwatch.com/)报道，2016年中国服装、鞋类和配饰等时尚产品在线零售市场额达到了1 875亿美元，显示出人们对于服装的需求。因此如何根据用户已经购买的服装，快速、精确、有效地搭配出合适的服装以促进用户的下一次购买是一个很有挑战性的任务。

服装搭配的核心是服装图像内容的描述。为了更好地搭配服装，需要描述服装图像的内容，其基本思想是将服装图像表达为特征向量^[1]。通常，越搭配的服装图像，它们的特征向量之间的距离越小。服装图像用一个固定长度的向量来表示，能够支持服装的搭配，但是不能适应现在日益剧增、变化多样的服装。

深度学习是机器学习与人工智能领域的重要进展。目前，深度卷积神经网络已成为提取图像特征的有效手段之一。早期的提取特征方法主要基于人工提取特征，如SIFT(scale-invariant feature transform)^[1]特征，与传统特征相比，深度神经网络提取的特征^[2]更为准确。此外，将图像特征用二进制哈希码来表示，是降低开销、提高计算速度的有效方法。因此，近期许多图像领域的研究，都开始探索基于深度神经网络提取的特征来生成哈希码。本文也沿用此思路，研究深度学习与哈希码结合的服装图像表示方法。

本文提出一个快速、精确、有效的服装搭配网络FMatchNet。首先，使用快速区域卷积神经网络(Faster-RCNN)^[3]技术检测出目标区域并进行分割；其次，采用深度卷积神经网络提取描述服装区域的特征；为了产生哈希编码特征，本文在两个全连接层之间接了一个哈希层$ \mathit{\boldsymbol{H}}$。图 1给出了基本实现框架，其中$ \mathit{\boldsymbol{W}}$是神经网络的共享参数向量，$ \mathit{\boldsymbol{\varGamma }}$是对比损失函数。此外，本文扩建了一个FClothes服装数据集，数据集图像主要来源于微博、贴吧等网站，其中包含大量受人青睐、满足时尚需求的高分辨率服装图像。数据集包括13 228幅图像。并注释服装类别和属性，注释细节在以前的工作^[4]中有说明。本文的主要工作总结如下：

图 1 快速服装搭配网络框架图

Fig. 1 The whole framework of the fast clothes matching network

1) 提出了一个新的学习快速风格双路网络，将Siamese CNN与哈希编码相结合。通过本文的网络提取服装图像的哈希特征和深度特征，利用哈希特征进行服装搭配，再利用深度特征对服装进行再搭配。

2) 扩建了一个细粒度标注的服装数据库FClothes。该数据库包括13 228幅图像，其中包含完整的类别和属性。

1 相关工作

随着网上购物的发展，服装时尚领域越来越受到计算机视觉和多媒体研究界的关注。现有的研究主要集中在服装检索^[5]、服装推荐^[6]和服装时尚性预测^[7]。例如Liu等人^[6]提出了一个服装推荐系统“Magic Closet”，引入了服装的中间属性特征，通过潜在支持向量机(latent SVM)^[8]训练分类器，针对不同场景对服装进行推荐。笔者以前的工作^[4]受此启发，开发了一个实用的天气导向服装推荐系统，研究了基于天气的服装推荐。张振焕等人^[9]通过优化残差网络提高了服装分类的精度。杨天祺等人^[10]在卷积神经网络的基础上对图片增广，进一步提高了服装分类的精度和速度。而原尉峰等人^[11]在全卷积神经网络的基础上加入条件随机场，实现了端到端的快速多目标服装检索。Veit等人^[12]提出了一个针对人类视觉偏好的服装搭配框架，数据集搭配是基于亚马逊共同购买的服装。他们用卷积神经网络(CNN)提取视觉特征，并引入一个相似性度量来模拟人们对服装搭配的概念。随后，Song等人^[13]在此基础上引入个人社区因素进行服装搭配，他们基于贝叶斯个性化排序框架对服装物品之间的搭配进行建模，能够联合建立服装模态与隐含搭配偏好之间的相关关系。与上述工作不同，本文更加关注在服装数据库中如何快速地进行服装搭配。因为随着经济的发展和物质水平的提高，服装的风格样式越来越多，服装的数量也与日俱增。所以，能够在众多的服装中快速地进行服装搭配是非常重要的。将哈希特征用于服装搭配可以提高服装搭配的速度, 因此本文提出一种学习快速风格网络来进行快速服装搭配。

目前有很多的哈希算法^[14-15]，这些方法可以分为两大类：无监督和有监督方法。无监督哈希方法使用未标记的数据来学习一组哈希码函数^[14-16]。最具代表性的是局部敏感哈希^[14]，其目的是最大化相似数据并映射到类似二进制码的概率。它通过将数据点投影到具有随机阈值的随机超平面来生成二进制码。光谱哈希^[15]是另一个代表性方法，它通过沿给定数据PCA(principal component analysis)方向的非线性函数的阈值产生二进制码。此外，使用监督信息可以提高二进制哈希码的学习性能。有监督方法^[17-19]在学习期间加入标签信息。这些有监督的哈希方法通常使用配对标签来生成有效的哈希函数。然而，这些算法通常需要较大的稀疏矩阵来描述训练集中的数据点之间的相似性。此外，深度神经网络哈希(DNNH)^[20]是非常好的图像检索方法，它使用部分连接层代替全连接层，每个部分负责学习一个二进制位。Fang等人^[21]则通过深度网络学习相近图像，然后利用LSH(local-sensitive Hashing)^[14]进行检索加速。而本文则是通过在网络中插入哈希层来学习服装图像的哈希码。Lin等人^[22]提出了深度哈希方法进行快速图像检索。本文也是借鉴了Lin等人^[22]的方法，提出了双路网络来学习服装图像的哈希码，并在FClothes数据集上进行验证。

2 本文方法

本文解决快速服装搭配问题的方案，如图 1所示。首先介绍采用Faster-RCNN进行服装区域检测，有效地定位服装区域。然后介绍提出的快速风格网络学习方法。

2.1 服装区域检测

本文首先在以前工作^[4]的基础上，剔除了5 000幅过时的图像，并新加入了1 600幅图像。此外，复杂的背景给服装搭配带来了严峻的挑战，如图 2所示，基于传统形状特征的算法很难处理这种问题。本文采用Faster-RCNN对图像进行预处理，旨在从图像中裁剪服装区域以消除杂乱背景的影响，即采用Faster-RCNN提取服装区域的包围盒。本文用扩建的数据集注释上下身服装区域。然后，使用Faster-RCNN框架来训练上下身服装检测，并根据检测结果进行分割。在测试阶段，只从整个检测结果中选择最高得分区域。

图 2 复杂背景下的服装

Fig. 2 Clothes in complex background

2.2 模型网络框架

如图 1所示，FMatchNet框架采用SiameseCNN+Hashing的混合结构，每幅服装图像经过此网络投影到快速风格空间。与Siamese网络^[23]相似，本文在LeNet网络的基础上进行修改，在两个全连接层之间加入哈希层，再采用Siamese网络的思想，将两个修改后的网络连接起来。$\mathit{\boldsymbol{q}}_{\rm up}$和$\mathit{\boldsymbol{q}}_{\rm low}$分别表示上身服装区域和下身服装区域，设$y$是服装对的二进制标签，如果区域$\mathit{\boldsymbol{q}}_{\rm up}$和$\mathit{\boldsymbol{q}}_{\rm low}$不兼容，则$y=0$，否则$y$=1。$\mathit{\boldsymbol{W}}$是本文神经网络的共享参数向量，$ \mathit{\boldsymbol{H}}$是在网络中加入的哈希层，$ \mathit{\boldsymbol{\varGamma }}(y)$是训练网络的损失函数。令$\mathit{\boldsymbol{G}}_{W}(\mathit{\boldsymbol{q}}_{\rm up})$和$\mathit{\boldsymbol{G}}_{W}(\mathit{\boldsymbol{q}}_{\rm low})$分别为$\mathit{\boldsymbol{q}}_{\rm up}$和$\mathit{\boldsymbol{q}}_{\rm low}$通过本文神经网络映射生成的低维可兼容空间中的两个点。然后，此神经网络的学习可以看做是测量上身服装区域$\mathit{\boldsymbol{q}}_{\rm up}$和下身服装区域$\mathit{\boldsymbol{q}}_{\rm low}$相容性的能量函数$ \mathit{\boldsymbol{E}}_{W}(\mathit{\boldsymbol{q}}_{\rm up}, \mathit{\boldsymbol{q}}_{\rm low})$

$ \boldsymbol{E}_{ W}\left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}\right)=\left\|\boldsymbol{G}_{{W}}\left(\boldsymbol{q}_{\mathrm{up}}\right)-\boldsymbol{G}_{{W}}\left(\boldsymbol{q}_{\mathrm{low}}\right)\right\| $

(1)

此网络通过数据集的正样本对$(\mathit{\boldsymbol{q}}_{\rm up}, \mathit{\boldsymbol{q}}_{\rm low})$和负样本对$(\mathit{\boldsymbol{q}}_{\rm up}, \mathit{\boldsymbol{q}}′_{\rm low})$进行学习，学习出来的网络符合以下条件

$ \boldsymbol{E}_{W}\left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}\right)+m^{2}<\boldsymbol{E}_{W}\left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}^{\prime}\right) $

(2)

式中，$m$是一个阈值。在实验阶段，尝试几个不同的阈值，选择性能最好的一个，最后确定阈值$m^{2}=0.2$。这里使用对比损失函数来训练此神经网络。损失函数$\varGamma (y)$定义为

$ \boldsymbol{\varGamma}(y)=\sum\limits_{i=1}^{p} \boldsymbol{L}\left(y, \left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}\right)\right)^{i} $

(3)

$ \begin{array}{c}{\boldsymbol{L}\left(y, \left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}\right)\right)^{i}=y \boldsymbol{L}_{\mathrm{C}}\left(\boldsymbol{E}_{w}\left(\boldsymbol{q}_{\mathrm{up}}, \boldsymbol{q}_{\mathrm{low}}\right)^{i}\right)+} \\ {(1-y) \boldsymbol{L}_{1}\left(\boldsymbol{E}_{w}\left(\boldsymbol{q}_{\rm up}, \boldsymbol{q}_{\mathrm{low}}\right)^{i}\right)}\end{array} $

(4)

式中，$(y, (\mathit{\boldsymbol{q}}_{\rm up}, \mathit{\boldsymbol{q}}_{\rm low}))^{i}$是第$i$个样本，$y$表示标签(正对或负对)。$\mathit{\boldsymbol{L}}_{\rm C}$和$\mathit{\boldsymbol{L}}_{\rm I}$是兼容对和不兼容对的局部损失函数，$p$表示训练样本的数量。设计$\mathit{\boldsymbol{L}}_{\rm C}$和$\mathit{\boldsymbol{L}}_{\rm I}$的目的是为了使$ \mathit{\boldsymbol{L}}$最小化，同时降低兼容对的损失且增加不兼容对的损失。

2.3 特征提取和搭配

为了有效地进行服装图像搭配，同时降低计算成本，本文方法将深度特征向量转换为二进制码。可以使用哈希或汉明距离快速比较这样的二进制编码。为了实现这个想法，本文在文献[12]网络的基础上将哈希层$ \mathit{\boldsymbol{H}}$嵌入两个全连接层之间，如图 1所示。给定图像$ \mathit{\boldsymbol{I}}$，首先哈希层的输出为$ \mathit{\boldsymbol{H}}_{\rm out}$，然后通过二值化阈值来获得二进制编码，输出$ \mathit{\boldsymbol{H}}$的二进制代码为

$ {{\mathit{\boldsymbol{H}}}^{j}}=\left\{ \begin{matrix} 1 & \mathit{\boldsymbol{H}}_{\text{out}}^{j}\ge 0.5 \\ 0 & {其他} \\ \end{matrix} \right. $

(5)

式中，$j=1, 2, …, h$($h$是哈希层中的节点数)是哈希层$ \mathit{\boldsymbol{H}}$的每一位，其中阈值0.5的取值是根据实验得出的结果。给定查询服装图像$\mathit{\boldsymbol{q}}_{\rm up}$(此处以上身服装为例，下身服装亦然)，采用汉明距离，在数据集中选取$n$个候选者的候选群$ \mathit{\boldsymbol{P}}$={$ \mathit{\boldsymbol{I}}_{1}, \mathit{\boldsymbol{I}}_{2}, …, \mathit{\boldsymbol{I}}_{n}$}，$\mathit{\boldsymbol{I}}_{k}(k=1, …, n)$为与查询服装图像相似风格的服装图像候选者。然后，为了进一步过滤具有类似风格的服装图像，再采用哈希层$ \mathit{\boldsymbol{H}}$前面的全连接层作为特征在候选者中进行排名。令$\mathit{\boldsymbol{V}}_{q}$和$V^{k}_{P}$为查询的服装图像$\mathit{\boldsymbol{q}}_{\rm up}$的特征向量和候选群中服装图像的特征向量。查询服装图像$\mathit{\boldsymbol{q}}_{\rm up}$和候选群中的第$k$个服装图像之间的搭配度量定义为它们对应的特征向量之间的距离

$ \boldsymbol{s}_{k}=\left\|\boldsymbol{V}_{q}-\boldsymbol{V}_{p}^{k}\right\| $

(6)

两者之间的距离越小，即$\mathit{\boldsymbol{s}}_{k}$越小，两个服装图像的搭配度越高。对候选群中的每个候选者按相似度升序进行排列。

2.4 搭配结果重排名

对于搭配结果可以使用查询扩展(QE)技术^[24]。给定数据库服装图像的排序列表，根据搭配距离的升序，本文对最高$t$个结果的特征向量求平均，本文$t$取5。再次对候选群的候选者重新查询。

3 实验

为评估本文方法，进行实验分析，在本节中将介绍实现细节：1)给出实验评价标准；2)评估服装区域检测性能，并与3个不同的数据集和1个基线进行比较；3)对本文提出的系统进行评价和扩展。

3.1 评价标准

根据文献[25]的评估标准，基于Top-$k$精度搭配性能，Top-$k$精度是前$k$个返回结果内正确匹配的比率。在本文的设置中，如果返回的前5个结果中至少有1个是与查询服装搭配的服装，则认为是正确的搭配。这里的搭配对是本文训练的正样本对，除了原来1幅图像中的搭配对，还人为设定了一部分正样本对。

3.2 服装区域检测

对于服装区域检测实验，本文注释了上下身体区域的10 000幅图像进行训练。本文与部分基准(part-based)^[26]在FClothes、CCP(clothing co-parsing)^[27]、Fashionista^[28]3个数据集上进行比较，测试图像在每个数据集中均占总数量的20%。表 1显示了3个数据集上两种方法的平均检测精度，可以发现本文方法比文献[26]方法的检测结果更好。具体来看，文献[26]方法的下身服装检测精度高于上身服装检测精度，因为上身部位有很大的变形，但是本文方法的上下身服装检测精度几乎是相同的。由此可见，本文方法在处理变形问题时的性能更好。

表 1 上衣和下衣区域检测精度
Table 1 The accuracy of upper and lower body detection

下载CSV

/%
方法	Fclothes	Fashionista^[28]	CCP^[27]
part-based^[26](上衣)	60.63	71.79	69.24
part-based^[26](下衣)	83.67	83.63	86.75
本文(上衣)	92.05	92.43	90.46
本文(下衣)	92.81	89.94	94.37

3.3 服装搭配

将本文的FMatchNet网络在FClothes数据集上进行实验。训练网络的方法借鉴基本准线^[12]的策略。在10 000幅图像的训练集中，将上下身区域来自同一幅图像的样本作为正样本，负样本是来自同类服装和来自不同类不同幅图像的区域。如$\mathit{\boldsymbol{q}}_{\rm up}$对应的正样本是来自同一幅图像的$\mathit{\boldsymbol{q}}_{\rm low}$，负样本来自随机其他图像的$q′_{\rm up}$或$q′_{\rm low}$。训练集为了公平地对比搭配精度和时间消耗，本文比较的基本准线是文献[12]方法中的strategic策略，同时也和文献[12]方法中的其他方法进行了比较。网络都是基于FClothes数据集进行训练的。表 2展示了本文方法和其他方法在不同哈希码长度下的比较，并在方法基础上增加QE算法与其一同比较。

表 2 本文方法与其他方法对比
Table 2 Comparisons between ours and other methods

下载CSV

方法	哈希长度
方法	$L$=0		$L$=0		$L$=0		$L$=8		$L$=16		$L$=32
Naive^[12]	√	√	-	-	-	-	-	-	-	-	-	-
Holdout-categories^[12]	-	-	√	√	-	-	-	-	-	-	-	-
Strategic^[12]	-	-	-	-	√	√	-	-	-	-	-	-
本文	-	-	-	-	-	-	√	√	√	√	√	√
QE	-	√	-	√	-	√	-	√	-	√	-	√
精度/%	25.28	26.34	28.32	31.26	37.89	40.07	34.52	36.49	48.75	50.81	52.68	53.40
时间消耗/ms	222.32	225.10	215.63	217.33	209.46	211.63	56.57	58.44	72.28	74.16	138.54	139.82
注：$L$为哈希码的长度值($L$=0表示未使用哈希码特征)，QE为是否在方法的基础上增加QE算法，√表示使用该方法，-表示未使用该方法，加粗字体表示最优结果。

此外，本文在搭配结果排名基础上使用查询扩展(QE)技术，可以看出使用查询扩展技术后，虽然时间消耗略微增加，但精度比未使用QE技术的精度可以提高1~3个百分点。当哈希码长度$L$ =8时，虽然时间消耗约为基线的0.25倍，但精度相对于基线有所降低，经分析是因为此时哈希码长度不能兼顾所有的服装风格。此外，随着哈希码的长度增加，精度和时间消耗也随之增加。当哈希码长度$L$ =32时虽然精度最高，但是时间消耗是$L$=16的两倍。所以对于FCLothes数据集来说，哈希码长度$L$ =16是最合适的选择。

为了检验本文FMatchNet网络在FClothes数据集搭配的完整性，检验了不同长度哈希码服装搭配的Top-$k$精度。如图 3所示。由图可以看出，在同一个Top-$k$下，随着哈希码长度增加，精度也随之提高。此外，$L$=16与$L$ =32时的精度相差不大，这是因为$L$ =16时，哈希码长度可以兼顾所有的服装风格。图 4展示了本文方法针对不同的上身衣物返回的一些示例配对服装。可以看出，大部分服装从视觉上得到了很好的搭配。

图 3 本文方法的Top-$k$精度

Fig. 3 The Top-$k$ accuracy of our method

图 4 部分衣服搭配结果

Fig. 4 Some examples of the clothing returned by our method

4 结论

针对大规模服装快速搭配问题，本文提出一种新的FMatchNet网络提取特征进行服装快速搭配的方法。本文通过FMatchNet网络提取快速风格的哈希特征和深度特征，快速风格的哈希特征能够更快地进行同一风格的服装搭配，深度特征则在同一风格中对服装进一步排名。在FClothes数据集的实验中，本文比较了3种不同长度的哈希码，当哈希码长度为16时，本文算法相比于基本准线算法速度提高近3倍，精度也得到了相应的提高。实验结果证明本文的方法是可行且有效的。此外，本文扩建了一个细粒度标注的服装数据库FClothes。但是，由于未考虑用户的个性化喜好搭配等因素，将在未来的工作中尝试更加个性化的服装搭配。

参考文献

[1] Lindeberg T. Scale invariant feature transform[J]. Scholarpedia, 2012, 7(5): #10491. [DOI:10.4249/scholarpedia.10491]

[2] Huang X, Ling Z G, Li X X. Discriminative deep feature learning method by fusing linear discriminant analysis for image recognition[J]. Journal of Image and Graphics, 2018, 23(4): 510–518. [黄旭, 凌志刚, 李绣心. 融合判别式深度特征学习的图像识别算法[J]. 中国图象图形学报, 2018, 23(4): 510–518. ] [DOI:10.11834/jig.170336]

[3] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM, 2015: 91-99. https://arxiv.org/abs/1506.01497

[4] Liu Y J, Gao Y B, Feng S H, et al. Weather-to-garment: weather-oriented clothing recommendation[C]//Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China: IEEE, 2017: 181-186.[DOI: 10.1109/ICME.2017.8019476]

[5] Liu S, Song Z, Liu G C, et al. Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 3330-3337.[DOI: 10.1109/CVPR.2012.6248071]

[6] Liu S, Feng J S, Zhang T Z, et al. Hi, magic closet, tell me what to wear![C]//Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan: ACM, 2012: 619-628.[DOI: 10.1145/2393347.2393433]

[7] Li Y C, Cao L L, Zhu J, et al. Mining fashion outfit composition using an end-to-end deep learning approach on set data[J]. IEEE Transactions on Multimedia, 2017, 19(8): 1946–1955. [DOI:10.1109/TMM.2017.2690144]

[8] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of 2008 IEEEConference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008: 1-8.[DOI: 10.1109/CVPR.2008.4587597]

[9] Zhang Z H, Zhou C L, Liang Y. An optimized clothing classification algorithm based on residual convolutional neural network[J]. Computer Engineering and Science, 2018. [张振焕, 周彩兰, 梁媛. 基于残差的优化卷积神经网络服装分类算法[J]. 计算机工程与科学, 2018. ]

[10] Yang T Q, Huang S X. Application of improved convolution neural network in classification and recommendation[J]. Application Research of Computers, 2018, 35(4): 974–977. [杨天祺, 黄双喜. 改进卷积神经网络在分类与推荐中的实例应用[J]. 计算机应用研究, 2018, 35(4): 974–977. ] [DOI:10.3969/j.issn.1001-3695.2018.04.003]

[11] Yuan W F, Guo J M, Su Z, et al. Clothing retrieval by deep muti-label parsing and Hashing[J]. Journal of Image and Graphics, 2019, 24(2): 159–169. [原尉峰, 郭佳明, 苏卓, 等. 结合深度多标签解析的哈希服装检索[J]. 中国图象图形学报, 2019, 24(2): 159–169. ] [DOI:10.11834/jig.180361]

[12] Veit A, Kovacs B, Bell S, et al. Learning visual clothing style with heterogeneous dyadic co-occurrences[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 4642-4650.[DOI: 10.1109/ICCV.2015.527]

[13] Song X M, Feng F L, Liu J H, et al. NeuroStylist: neural compatibility modeling for clothing matching[C]//Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, California, USA: ACM, 2017: 753-761.[DOI: 10.1145/3123266.3123314]

[14] Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via Hashing[C]//Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK: ACM, 1999: 518-529. https://wenku.baidu.com/view/8fa10409bb68a98271fefa64.html

[15] Weiss Y, Torralba A, Fergus R. Spectral Hashing[C]//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: ACM, 2008: 1753-1760. http://people.csail.mit.edu/torralba/publications/spectralhashing.pdf

[16] Gong Y C, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes[C]//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011: 817-824.[DOI: 10.1109/CVPR.2011.5995432]

[17] Norouzi M, Fleet D J. Minimal loss Hashing for compact binary codes[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, Washington, USA: ACM, 2011: 353-360. http://www.cs.toronto.edu/~norouzi/research/papers/min_loss_hash.pdf

[18] Liu W, Wang J, Ji R R, et al. Supervised Hashing with kernels[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 2074-2081.[DOI: 10.1109/CVPR.2012.6247912]

[19] Kulis B, Darrell T. Learning to Hash with binary reconstructive embeddings[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: ACM, 2009: 1042-1050. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-101.pdf

[20] Lai H J, Pan Y, Liu Y, et al. Simultaneous feature learning and Hash coding with deep neural networks[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 3270-3278.[DOI: 10.1109/CVPR.2015.7298947]

[21] Fang Z W, Liu J, Wang Y H, et al. Object-aware deep network for commodity image retrieval[C]//Proceedings of 2016 ACM on International Conference on Multimedia Retrieval. New York, USA: ACM, 2016: 405-408.[DOI: 10.1145/2911996.2912027]

[22] Lin K, Yang H F, Hsiao J H, et al. Deep learning of binary Hash codes for fast image retrieval[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA: IEEE, 2015: 27-35.[DOI: 10.1109/CVPRW.2015.7301269]

[23] Melekhov I, Kannala J, Rahtu E. Siamese network features for image matching[C]//Proceedings of 2016 International Conference on Pattern Recognition.Cancun, Mexico: IEEE, 2016: 378-383.[DOI: 10.1109/ICPR.2016.7899663]

[24] Chum O, Philbin J, Sivic J, et al. Total recall: automatic query expansion with a generative feature model for object retrieval[C]//Proceedings of 2007 IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007: 1-8.[DOI: 10.1109/ICCV.2007.4408891]

[25] Kiapour M H, Han X F, Lazebnik S, et al. Where to buy it: matching street clothing photos in online shops[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 3343-3351.[DOI: 10.1109/ICCV.2015.382]

[26] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts[C]//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011: 1385-1392.[DOI: 10.1109/CVPR.2011.5995741]

[27] Liang X D, Lin L, Yang W, et al. Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval[J]. IEEE Transactions on Multimedia, 2016, 18(6): 1175–1186. [DOI:10.1109/TMM.2016.2542983]

[28] Yamaguchi K, Kiapour M H, Ortiz L E, et al. Parsing clothing in fashion photographs[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 3570-3577.[DOI: 10.1109/CVPR.2012.6248101]