着装场景下双分支网络的人体姿态估计
Dual branch network for human pose estimation in dressing scene
- 2022年27卷第4期 页码:1110-1124
收稿:2020-11-04,
修回:2021-2-24,
录用:2021-3-3,
纸质出版:2022-04-16
DOI: 10.11834/jig.200642
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-11-04,
修回:2021-2-24,
录用:2021-3-3,
纸质出版:2022-04-16
移动端阅览
目的
2
人体姿态估计旨在识别和定位不同场景图像中的人体关节点并优化关节点定位精度。针对由于服装款式多样、背景干扰和着装姿态多变导致人体姿态估计精度较低的问题,本文以着装场景下时尚街拍图像为例,提出一种着装场景下双分支网络的人体姿态估计方法。
方法
2
对输入图像进行人体检测,得到着装人体区域并分别输入姿态表示分支和着装部位分割分支。姿态表示分支通过在堆叠沙漏网络基础上增加多尺度损失和特征融合输出关节点得分图,解决服装款式多样以及复杂背景对关节点特征提取干扰问题,并基于姿态聚类定义姿态类别损失函数,解决着装姿态视角多变问题;着装部位分割分支通过连接残差网络的浅层特征与深层特征进行特征融合得到着装部位得分图。然后使用着装部位分割结果约束人体关节点定位,解决服装对关节点遮挡问题。最后通过姿态优化得到最终的人体姿态估计结果。
结果
2
在构建的着装图像数据集上验证了本文方法。实验结果表明,姿态表示分支有效提高了人体关节点定位准确率,着装部位分割分支能有效避免着装场景中人体关节点误定位。在结合着装部位分割优化后,人体姿态估计精度提高至92.5%。
结论
2
本文提出的人体姿态估计方法能够有效提高着装场景下的人体姿态估计精度,较好地满足虚拟试穿等实际应用需求。
Objective
2
Human pose estimation aims at the human joints recognition and orientation in a targeted image of different scenes and the joint point positioning accuracy optimization. Current methods of human pose estimation have a good performance in some targeted dressing scenes where the visibility of body joints was constrained by occasional clothes wearing and failed in some complicated dressing scenes like fashion street shot. There are two main difficulties of human pose estimation in the dressing scene which result in the low accuracy of human body joints positioning and human pose estimation. One aspect is that various styles of clothes wearing leads to human body joints partially occluded and various texture and color information caused the failure of human joint point positioning. Another one is that there are various body postures in dressing scene. A method of dual branch network is required for human pose estimation in dressing scene.
Method
2
First
human detection is implemented on the input image to obtain the area of dressed human body. The pose representation branch and the dress part segmentation branch are segmented each. Next
to avoid the interference of the joint point feature extraction in the context of the variety of clothing styles and complex background
the multi-scale loss and feature fusion pose representation branch generate the joint point score map based on the stacked hourglass network. To overcome the problem of human pose with different angles of view in the dressing scene
the pose category loss function is harnessed based on pose clustering. Then
the dress part segmentation branch is constructed based on the shallow connection
deep features of the residual network and feature fusion performance based on the targeted label of dressed part to build the dressed part score map. At the end
in order to resolve the clothing occlusion of joints issue
the dress part segmentation result is used to constrain the position of human body joints
and the final human pose estimation is obtained for pose optimization.
Result
2
The illustrated method is validated on the constructed image dataset of the dressed people. Our demonstration show that the constructed pose representation branch improves the positioning accuracy of human body joints effectively
especially the introduced pose category loss function improved the robustness of multi-angles human pose estimation. In terms of the optimization integrated with the semantic segmentation of dressed parts
the estimation accuracy of human body pose is improved to 92.5%.
Conclusion
2
In order to handle low accuracy of human pose estimation derived from various clothing styles and various human body postures in dressing scene
a dual-branch network for human pose estimation is facilitated in dressing scene. To improve the positioning accuracy of human body joints
we construct pose representation model to fuse global and local features. A pose category loss is melted to improve the robustness of multi-view angles of human pose estimation. We integrate the semantic segmentation of dressed parts to constrain the position of human body joints which improves the accuracy of human body pose estimation in dressing scene effectively. The constructed image dataset of human dresses demonstrates that the proposed method can improve the estimation accuracy of human body pose in dressing scene. The clear estimation ratio of joint points reaches 92.5%. The estimation accuracy of the human pose is still low
especially in the occasion of dresses wear; overcoat and multi-layer clothes cover human body joints seriously. Meanwhile
it is required to improve the algorithm of the positioning accuracy of human body joints when people have bags and other accessories. The accuracy of human pose estimation is improved in multi-oriented dressing scenes further.
Andriluka M, Pishchulin L, Gehler P and Schiele B. 2014.2D human pose estimation: new benchmark and state of the art analysis//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 3686-3693 [ DOI: 10.1109/cvpr.2014.471 http://dx.doi.org/10.1109/cvpr.2014.471 ]
Chen Y L, Wang Z C, Peng Y X, Zhang Z Q, Yu G and Sun J. 2018. Cascaded pyramid network for multi-person pose estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7103-7112 [ DOI: 10.1109/cvpr.2018.00742 http://dx.doi.org/10.1109/cvpr.2018.00742 ]
Dai Q, Shi X B, Qiao J Z, Liu F and Zhang D Y. 2017. Articulated human pose estimation with occlusion level. Journal of Computer-Aided Design and Computer Graphics, 29(2): 279-289
代钦, 石祥滨, 乔建忠, 刘芳, 张德园. 2017. 结合遮挡级别的人体姿态估计方法. 计算机辅助设计与图形学学报, 29(2): 279-289 [DOI: 10.3969/j.issn.1003-9775.2017.02.009]
Dong J, Chen Q, Shen X H, Yang J C and Yan S C. 2014. Towards unified human parsing and pose estimation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 843-850 [ DOI: 10.1109/CVPR.2014.113 http://dx.doi.org/10.1109/CVPR.2014.113 ]
Ge Y Y, Zhang R M, Wang X G, Tang X O and Luo P. 2019. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images//Proceedings of 2019 IEEE/CVF conference on computer vision and pattern recognition. Long Beach, USA: IEEE: 5332-5340 [ DOI: 10.1109/CVPR.2019.00548 http://dx.doi.org/10.1109/CVPR.2019.00548 ]
He K M, Gkioxari G, Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2980-2988 [ DOI: 10.1109/ICCV.2017.322 http://dx.doi.org/10.1109/ICCV.2017.322 ]
He K M and Sun J. 2015. Convolutional neural networks at constrained time cost//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5353-5360 [ DOI: 10.1109/CVPR.2015.7299173 http://dx.doi.org/10.1109/CVPR.2015.7299173 ]
Insafutdinov E, Pishchulin L, Andres B, Andriluka M and Schiele B. 2016. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 34-50 [ DOI: 10.1007/978-3-319-46466-4_3 http://dx.doi.org/10.1007/978-3-319-46466-4_3 ]
Johnson S and Everingham M. 2010. Clustered pose and nonlinear appearance models for human pose estimation//Proceedings of British Machine Vision Conference. Aberystwyth, UK: BMVA Press: 5 [ DOI: 10.5244/c.24.12 http://dx.doi.org/10.5244/c.24.12 ]
Ke L P, Chang M C, Qi H G and Lyu S. 2018. Multi-scale structure-aware network for human pose estimation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 731-746 [ DOI: 10.1007/978-3-030-01216-8_44 http://dx.doi.org/10.1007/978-3-030-01216-8_44 ]
Ladicky L, Torr P H S and Zisserman A. 2013. Human pose estimation using a joint pixel-wise and part-wise formulation//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3578-3585 [ DOI: 10.1109/CVPR.2013.459 http://dx.doi.org/10.1109/CVPR.2013.459 ]
Li J, Su W and Wang Z F. 2020. Simple pose: rethinking and improving a bottom-up approach for multi-person pose estimation//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11354-11361 [ DOI: 10.1609/aaai.v34i07.6797 http://dx.doi.org/10.1609/aaai.v34i07.6797 ]
Liang X D, Gong K, Shen X H and Lin L. 2019. Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4): 871-885 [DOI: 10.1109/TPAMI.2018.2820063]
Liu Z W, Luo P, Qiu S, Wang X G and Tang X O. 2016. DeepFashion: powering robust clothes recognition and retrieval with rich annotations//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1096-1104 [ DOI: 10.1109/CVPR.2016.124 http://dx.doi.org/10.1109/CVPR.2016.124 ]
Marras I, Palasek P and Patras I. 2017. Deep refinement convolutional networks for human pose estimation//Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition. Washington, USA: IEEE: 446-453 [ DOI: 10.1109/FG.2017.148 http://dx.doi.org/10.1109/FG.2017.148 ]
Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 483-499 [ DOI: 10.1007/978-3-319-46484-8_29 http://dx.doi.org/10.1007/978-3-319-46484-8_29 ]
Nie X C, Feng J S, Zuo Y M and Yang S C. 2018. Human pose estimation with parsing induced learner//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2100-2108 [ DOI: 10.1109/CVPR.2018.00224 http://dx.doi.org/10.1109/CVPR.2018.00224 ]
Ramanan D. 2006. Learning to parse images of articulated bodies//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 1129-1136 [ DOI: 10.5555/2976456.2976598 http://dx.doi.org/10.5555/2976456.2976598 ]
Ruan T, Liu T, Huang Z L, Wei Y C, Wei S K and Zhao Y. 2019. Devil in the details: towards accurate single and multiple human parsing//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press: 4814-4821 [ DOI: 10.1609/aaai.v33i01.33014814 http://dx.doi.org/10.1609/aaai.v33i01.33014814 ]
Sun K, Xiao B, Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5686-5696 [ DOI: 10.1109/CVPR.2019.00584 http://dx.doi.org/10.1109/CVPR.2019.00584 ]
Tang W, Yu P and Wu Y. 2018. Deeply learned compositional models for human pose estimation//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 197-214 [ DOI: 10.1007/978-3-030-01219-9_12 http://dx.doi.org/10.1007/978-3-030-01219-9_12 ]
Tompson J, Goroshin R, Jain A, LeCun Y and Bregler C. 2015. Efficient object localization using convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 648-656 [ DOI: 10.1109/CVPR.2015.7298664 http://dx.doi.org/10.1109/CVPR.2015.7298664 ]
Toshev A and Szegedy C. 2014. DeepPose: human pose estimation via deep neural networks//Proceedings of 2014 IEEE Conference on computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE: 1653-1660 [ DOI: 10.1109/CVPR.2014.214 http://dx.doi.org/10.1109/CVPR.2014.214 ]
Wang R, Huang C Y and Wang X Y. 2020. Global relation reasoning graph convolutional networks for human pose estimation. IEEE Access, 8: 38472-38480 [DOI: 10.1109/ACCESS.2020.2973039]
Wei S E, Ramakrishna V, Kanade T and Sheikh Y. 2016. Convolutional pose machines//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4724-4732 [ DOI: 10.1109/CVPR.2016.511 http://dx.doi.org/10.1109/CVPR.2016.511 ]
Xia F T, Wang P, Chen X J and Yuille A L. 2017. Joint multi-person pose estimation and semantic part segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 6080-6089 [ DOI: 10.1109/CVPR.2017.644 http://dx.doi.org/10.1109/CVPR.2017.644 ]
Xiao B, Wu H P and Wei Y C. 2018. Simple baselines for human pose estimation and tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 472-487 [ DOI: 10.1007/978-3-030-01231-1_29 http://dx.doi.org/10.1007/978-3-030-01231-1_29 ]
Yamaguchi K, Kiapour M H, Ortiz L E and Berg T L. 2012. Parsing clothing in fashion photographs//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE: 3570-3577 [ DOI: 10.1109/CVPR.2012.6248101 http://dx.doi.org/10.1109/CVPR.2012.6248101 ]
Yang W, Li S, Ouyang W L, Li H S and Wang X G. 2017. Learning feature pyramids for human pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1290-1299 [ DOI: 10.1109/ICCV.2017.144 http://dx.doi.org/10.1109/ICCV.2017.144 ]
Yang X M, Zhou Y H, Zhang S R, Wu K W and Sun Y X. 2019. Human pose estimation based on cross-stage structure. Journal of Image and Graphics, 24(10): 1692-1702
杨兴明, 周亚辉, 张顺然, 吴克伟, 孙永宣. 2019. 跨阶段结构下的人体姿态估计. 中国图象图形学报, 24(10): 1692-1702 [DOI: 10.11834/jig.190028]
Yang Y and Ramanan D. 2011. Articulated pose estimation with flexible mixtures-of-parts//Proceedings of CVPR 2011. Colorado Springs, USA: IEEE: 1385-1392 [ DOI: 10.1109/CVPR.2011.5995741 http://dx.doi.org/10.1109/CVPR.2011.5995741 ]
Zhang S H, Li R L, Dong X, Rosin P, Cai Z X, Han X, Yang D C, Huang H Z and Hu S M. 2019. Pose2Seg: detection free human instance segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 889-898 [ DOI: 10.1109/CVPR.2019.00098 http://dx.doi.org/10.1109/CVPR.2019.00098 ]
相关作者
相关机构
京公网安备11010802024621