Current Issue Cover
融合视觉风格和标签约束的少数民族服装图像解析

张茜1, 刘骊1,2, 甘霖1, 付晓东1,2, 刘利军1,2, 黄青松1,2(1.昆明理工大学信息工程与自动化学院, 昆明 650500;2.云南省计算机技术应用重点实验室, 昆明 650500)

摘 要
目的 少数民族服装款式结构复杂,视觉风格各异。由于缺少民族服装语义标签、局部特征繁杂以及语义标签之间存在相互干扰等因素导致少数民族服装图像解析准确率和精度较低。因此,本文提出了一种融合视觉风格和标签约束的少数民族服装图像解析方法。方法 首先基于本文构建的包含55个少数民族的服装图像数据集,按照基本款式结构、着装区域、配饰和不同视觉风格自定义少数民族服装的通用语义标签和民族语义标签,同时设置4组标注对,共8个标注点;然后,结合自定义语义标签和带有标注对的训练图像,在深度完全卷积神经网络SegNet中加入视觉风格以融合局部特征和全局特征,并引入属性预测、风格预测和三元组损失函数对输入的待解析图像进行初步解析;最后,通过构建的标签约束网络进一步优化初步解析结果,避免标签相互干扰,得到优化后的最终解析结果。结果 在构建的少数民族服装图像数据集上进行验证,实验结果表明,标注对有效提升了局部特征的检测准确率,构建的视觉风格网络能够有效融合少数民族服装的全局特征和局部特征,标签约束网络解决了标签之间相互干扰的问题,在结合视觉风格网络和标签约束网络后,能够明显提升少数民族服装解析的平均精度,像素准确度达到了90.54%。结论 本文提出的融合视觉风格和标签约束的少数民族服装图像解析方法,能够提高少数民族服装图像解析的准确率和精度,对传承祖国文化、保护非物质文化遗产具有很好的意义。
关键词
Clothing parsing of Chinese minorities via the fusion of visual style and label constraints

Zhang Qian1, Liu Li1,2, Gan Lin1, Fu Xiaodong1,2, Liu Lijun1,2, Huang Qingsong1,2(1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2.Computer Technology Application Key Laboratory of Yunnan Province, Kunming 650500, China)

Abstract
Objective Many minority groups live in China, and the visual styles of their clothing are different. The combination of clothing parsing and the clothing culture of these minority groups plays an important role in realizing the digital protection of the clothing images of these groups and the inheritance of their culture. However, a complete dataset of the clothing images of Chinese minorities remains lacking. The clothing styles of minority groups have complex structures and different visual styles. Semantic labels to distinguish the clothing of different minorities are lacking, and defining the semantic labels of ethnic accessories is a challenging task. Describing information, such as local details, styles, and ethnic characteristics of minority group clothing, is difficult when using existing clothing image parsing methods. Mutual interference between semantic labels leads to unsatisfactory accuracy and precision of clothing image parsing. Therefore, we proposed a clothing parsing method based on visual style and label constraints. Method Our method primarily parsed minority group clothing through their visual style by fusing local and global features. Then, the label constraint network was used to suppress redundant tags and optimized the preliminary parsing results. First, we defined the general semantic labels of minority group clothing. The distinctive semantic labels were defined in accordance with the combination preference of semantic labels. We set four sets of annotation pairs based on human body parts, with a total of eight label points. Each pair of annotations corresponds to a set of key points on the clothing structure. The upper body garment was marked with the left/right collar, left/right sleeves, and left/right top hem. The lower body garment was marked with the left/right bottom hem. We also marked the visibility of each annotation and used the label annotations to determine whether occlusion occurred in the clothing. Second, combining the training images with the annotation pairs and the self-defined semantic labels, a visual style network was added on the basis of a full convolutional network. A branch was built on the last convolutional layer in the SegNet network. The branch was divided into three parts, with each part respectively dealing with the position and visibility of the annotation pairs and the local and global characteristics of the clothes. The two parts of the local and global features of the clothing were outputted to “fc7_fusion” for fusion. The style features were returned to the SegNet network through a deconvolution layer, and preliminary parsing results were obtained. Finally, a label mapping function was used to convert the preliminary parsing result into a label vector in accordance with the number of labels. Each element indicates whether a corresponding label exists in the preliminary parsing result. Then, the label vector was compared with the true semantic labels in the training set, and the labels were corrected to suppress redundant label probability scores. The label constraint network eliminated redundant and erroneous labels by comparing the labels of the preliminary parsing results with those of the training images. The label constraint network avoided the mutual interference of labels and increased the accuracy of the parsing result. In addition, we constructed a clothing image dataset of 55 minority groups. The primary sources were online shopping sites, such as Taobao, Tmall, and JD. This dataset was expanded by including datasets from other platforms, such as Baidu Pictures, blogs, and forums. A total of 61 710 images were collected. At least 500 images were collected for each minority group. Result The proposed method was validated on an image dataset of minority group clothing. Experimental results showed that the detection accuracy of clothing visual style features was higher with annotation pairs. The visual style network efficiently fused local and global features. The label constraint network effectively solved the mutual interference problem of labels. The method proposed in this study improved parsing accuracy on large-scale clothing labels, particularly on skirts with considerable differences in pattern texture and color blocks. The method also improved the small labels of accessories, such as hats and collars. The results of the minority group clothing parsing improved significantly. The pixel accuracy of the parsing results reached 90.54%. Conclusion The clothing of minority groups is characterized by complicated styles and accessories, lack of semantic labels, and complex labels that interfere with one another. Thus, we proposed a clothing parsing method that fuses visual style with label constraints. We constructed a dataset of minority group clothing images and defined the generic and distinctive semantic labels of minority group clothing. We made pixel-level semantic annotations and set up annotation pairs on the training images. Then, we built a visual style network based on SegNet to obtain preliminary parsing results. Finally, the mutual interference problem of semantic labels was solved through a label constraint network to obtain the final parsing result. Compared with other clothing parsing methods, our method improved the accuracy of minority group clothing image parsing. Inheriting culture and protecting intangible cultural heritage are significant. However, some clothing parsing results of this method are not ideal, particularly the accuracy of small accessories. The semantic labels of minority group clothing are imperfect and insufficiently accurate. Subsequent work will continue to improve the dataset, focusing on the aforementioned issues to further improve the accuracy of minority group clothing parsing.
Keywords

订阅号|日报