单视角三维人体重建的着装特征学习
Clothed feature learning for single-view 3D human reconstruction
- 2024年29卷第9期 页码:2610-2624
收稿日期:2023-10-11,
修回日期:2024-02-23,
纸质出版日期:2024-09-16
DOI: 10.11834/jig.230623
移动端阅览

浏览全部资源
扫码关注微信
收稿日期:2023-10-11,
修回日期:2024-02-23,
纸质出版日期:2024-09-16
移动端阅览
目的
2
由于单视角着装人体重建中存在肢体遮挡、着装姿态复杂,且现有方法仅能精确提取和表示着装人体图像中的视觉特征,未考虑复杂的着装姿态引起的动态细节表达,较难生成具有动态褶皱的着装人体模型。因此,提出一种单视角三维人体重建的着装特征学习方法。
方法
2
首先对着装人体图像集中的单视角图像进行肢体特征表示,通过二维关节点预测与姿态特征深度回归,提取人体的着装姿态特征;再基于着装姿态特征,定义以柔性变形关节点为中心的着装褶皱采样空间和柔性变形损失函数,通过引入服装模板对输入的着装人体真值模型学习着装柔性变形,获得着装褶皱特征;然后,结合姿态参数回归、特征图采样特征和编解码器,构建联合着装人体像素和体素的人体形状特征学习模块,得到着装人体形状特征;最后结合褶皱特征、着装人体形状特征和计算的三维采样空间,通过定义有向距离场进行三维人体重建,输出最终的着装人体模型。
结果
2
为了验证方法的有效性,在公开的THuman2.0数据集进行对比实验。结果显示,构建姿态特征学习模块有助于重建完整的肢体与正确的姿态,以褶皱特征学习对形状特征进行优化,可以获得高精度的重建结果。与当前先进的单视角三维人体重建方法比较,相比于性能第2的模型,本文方法重建结果的点到面距离与倒角距离分别降低了4.4%和2.6%。
结论
2
本文提出的单视角三维人体重建的着装特征学习方法,能有效学习单视角三维人体重建的着装特征,生成具有复杂姿态和动态褶皱的着装人体模型。
Objective
2
Clothed human reconstruction is an important problem in the field of computer vision and computer graphics. This process aims to generate three-dimensional (3D) human body models, including clothes and accessories, through computer technology, and is widely used in virtual reality, digital human body, 3D clothing assistant design, film and television special effects production, and other scenes. Compared with a large number of single-view images available on the internet, multiview images are more difficult to obtain. Considering that single-view images are easier to obtain on the internet, which can greatly reduce the use conditions and hardware cost of reconstruction, we consider single-view images as input to establish a complete mapping between single-view human image and human shape and restore the 3D shape and geometric details of the human body. Most methods based on parametric models can only predict the shape and posture of the human body with a smooth surface, whereas nonparametric model methods lack a fixed grid topology when generating fine geometric shapes. High-precision 3D human model extraction can be realized through the combination of parametric human model and implicit function. Given that clothing can produce dynamic flexible deformation with the change in human posture, most methods focus on obtaining the fold details of a clothed human model from 3D mesh deformation. The clothing can be separated from the human body with the assistance of a clothed template, and flexible deformation of the clothing caused by human body posture can be directly obtained via the learning-based method. Given the overlapping of limbs, occlusion, and complex clothed posture of clothed human body in single-view 3D human body reconstruction, obtaining geometric shape representation under various clothed postures and angles is difficult. Moreover, existing methods can only accurately extract and represent visual features from clothed human body images without considering the dynamic detail expression caused by complex clothed posture. Difficulties are encountered regarding the representation and learning of the clothed features related to the posture of single-view clothed human and generation of a clothed mesh with complex posture and dynamic folds. In this paper, we propose a single-perspective 3D human reconstruction clothed feature learning method.
Method
2
We propose a feature learning approach to reconstruct clothed human with a single-view image. The experimental hardware platform in this paper used two NVIDIA GeForce RTX 1080Ti GPU. We utilized the clothing co-parsing fashion street photography dataset, which includes 2 098 human images, to analyze the physical features of clothing. Human3.6M dataset was used to learn posture features of the human body, with the test set at 3DPW capture from the field environment. For fold feature learning, we used objects 00096 and 00159 in the CAPE dataset. For better training effects of the clothed mesh, we selected 150 meshes close to the dressed posture from the THuman2.0 dataset as the training set for use in shape feature learning. First, we represented the limb features of the single-view image and extracted the clothed human pose features through 2D node prediction and deep regression of pose features. Then, based on the pose features of the clothing, the sampling space of clothing fold centered on the flexible deformation joint and flexible deformation loss function were defined. In addition, flexible clothing deformation was learned through the introduction of a clothing template to the input ground truth model of the clothing body to obtain fold features. We only focused on crucial details inside the space to acquire the fold features. Afterward, the human shape features learning module was constructed via the combination of posture parameter regression, feature map sampling feature, and codec. The pixel and voxel alignment features were learned from the corresponding image and grid in the 3D human mesh dataset, and the shape features of the human body were decoded. Finally, through the combination of fold features, shape features of a clothed human, and calculated 3D sampling space, the 3D human mesh was reconstructed by defining the signed distance field, and the final clothed human model was outputted.
Result
2
Aiming at the results of posture feature and single-view 3D clothed human reconstruction, we used 3DPW and THuman2.0 datasets. Our experimental results findings compared with those of the three methods on the 3DPW dataset. The mean per joint position error (MPJPE) and MPJPE of Platts alignment (PA-MPJPE) were used to evaluate the differences between the predicted 3D joints and ground truth. The mean per vertex position error (MPVPE) was used to evaluate the predicted SMPL 3D human shape and the ground truth grid. Compared with that of the second-best model, the error was decreased by 1.28 on MPJPE, 1.66 on PA-MPJPE and 2.38 on MPVPE, and the average error was reduced by 2.4%. For the 3D reconstruction of the clothed human body, we conducted experiments to compare the four methods on the THuman2.0 dataset. We used the Chamfer distance (CD/Chamfer) and point-to-surface distance (P2S) of the 3D space to evaluate the gap between the two 3D mesh groups. Notably, the P2S of the reconstructed result can be reduced by 4.4% compared with the second-best model, and CD can be reduced by 2.6%. Experimental results reveal that the posture feature learning module contributed to the reconstruction of the complete limb and correct posture, and fold feature learning for optimized learned shape features can be used to obtain high-precision reconstruction results.
Conclusion
2
In this paper, the clothed feature learning method for single-view 3D human-body reconstruction enables the effective learning of the clothed feature of single-view 3D human reconstruction and generates clothed human reconstruction results with complex posture and dynamic folds.
Aggarwal A , Wang J K , Hogue S , Ni S F , Budagavi M and Guo X H . 2022 . Layered-garment net: generating multiple implicit garment layers from a single image // Proceedings of the 16th Asian Conference on Computer Vision . Macao, China : Springer: 378 - 395 [ DOI: 10.1007/978-3-031-26319-4_23 http://dx.doi.org/10.1007/978-3-031-26319-4_23 ]
Alldieck T , Magnor M , Bhatnagar B L , Theobalt C and Pons-Moll G . 2019 . Learning to reconstruct people in clothing from a single RGB camera // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 1175 - 1186 [ DOI: 10.1109/CVPR.2019.00127 http://dx.doi.org/10.1109/CVPR.2019.00127 ]
Cao Z , Simon T , Wei S E and Sheikh Y . 2017 . Realtime multi-person 2d pose estimation using part affinity fields // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 1302 - 1310 [ DOI: 10.1109/CVPR.2017.143 http://dx.doi.org/10.1109/CVPR.2017.143 ]
Chan K Y , Lin G S , Zhao H Y and Lin W S . 2022 . Integratedpifu: integrated pixel aligned implicit function for single-view human reconstruction // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 328 - 344 [ DOI: 10.1007/978-3-031-20086-1_19 http://dx.doi.org/10.1007/978-3-031-20086-1_19 ]
Chen X , Jiang T J , Song J , Yang J L , Black M J , Geiger A and Hilliges O . 2022 . gDNA: towards generative detailed neural avatars // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 20395 - 20405 [ DOI: 10.1109/CVPR52688.2022.01978 http://dx.doi.org/10.1109/CVPR52688.2022.01978 ]
Chen X , Zheng Y F , Black M J , Hilliges O and Geiger A . 2021 . SNARF: differentiable forward skinning for animating non-rigid neural implicit shapes // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 11574 - 11584 [ DOI: 10.1109/ICCV48922.2021.01139 http://dx.doi.org/10.1109/ICCV48922.2021.01139 ]
Cho J , Youwang K and Oh T H . 2022 . Cross-attention of disentangled modalities for 3d human mesh recovery with transformers // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 342 - 359 [ DOI: 10.1007/978-3-031-19769-7_20 http://dx.doi.org/10.1007/978-3-031-19769-7_20 ]
Choi H , Moon G and Lee K M . 2020 . Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 769 - 787 [ DOI: 10.1007/978-3-030-58571-6_45 http://dx.doi.org/10.1007/978-3-030-58571-6_45 ]
Choi H , Moon G , Park J K and Lee K M . 2022 . Learning to estimate robust 3D human mesh from in-the-wild crowded scenes // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 1465 - 1474 [ DOI: 10.1109/CVPR52688.2022.00153 http://dx.doi.org/10.1109/CVPR52688.2022.00153 ]
He T , Xu YL , Saito S , Soatto S and Tung T . 2021 . ARCH++: animation-ready clothed human reconstruction revisited // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 11026 - 11036 [ DOI: 10.1109/ICCV48922.2021.01086 http://dx.doi.org/10.1109/ICCV48922.2021.01086 ]
Ho H I , Xue L X , Song J and Hilliges O . 2023 . Learning locally editable virtual humans // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 21024 - 21035 [ DOI: 10.1109/CVPR52729.2023.02014 http://dx.doi.org/10.1109/CVPR52729.2023.02014 ]
Hu P P , Nourbakhsh N , Tian J , Sturges S , Dadarlat V and Munteanu A . 2020 . A generic method of wearable items virtual try-on . Textile Research Journal , 90 ( 19/20 ): 2161 - 2174 [ DOI: 10.1177/0040517520909995 http://dx.doi.org/10.1177/0040517520909995 ]
Huang Z , Xu Y L , Lassner C , Li H and Tung T . 2020 . ARCH: animatable reconstruction of clothed humans // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 3090 - 3099 [ DOI: 10.1109/CVPR42600.2020.00316 http://dx.doi.org/10.1109/CVPR42600.2020.00316 ]
Ionescu C , Papava D , Olaru V and Sminchisescu C . 2014 . Human3.6M: large scale datasets and predictive methods for 3d human sensing in natural environments . IEEE Transactions on Pattern Analysis and Machine Intelligence , 36 ( 7 ): 1325 - 1339 [ DOI: 10.1109/TPAMI.2013.248 http://dx.doi.org/10.1109/TPAMI.2013.248 ]
Jinka S S , Srivastava A , Pokhariya C , Sharma A and Narayanan P J . 2023 . Sharp: shape-aware reconstruction of people in loose clothing . International Journal of Computer Vision , 131 ( 4 ): 918 - 937 [ DOI: 10.1007/s11263-022-01736-z http://dx.doi.org/10.1007/s11263-022-01736-z ]
Kolotouros N , Pavlakos G and Daniilidis K . 2019 . Convolutional mesh regression for single-image human shape reconstruction // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 4496 - 4505 [ DOI: 10.1109/CVPR.2019.00463 http://dx.doi.org/10.1109/CVPR.2019.00463 ]
Lin S Y , Zhang H W , Zheng Z R , Shao R Z and Liu Y B . 2022 . Learning implicit templates for point-based clothed human modeling // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 210 - 228 [ DOI: 10.1007/978-3-031-20062-5_13 http://dx.doi.org/10.1007/978-3-031-20062-5_13 ]
Liu L Y , Sun J C , Gao Y Q and Chen J Y . 2021 . HEI-Human: a hybrid explicit and implicit method for single-view 3D clothed human reconstruction // Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision . Beijing, China : Springer: 251 - 262 [ DOI: 10.1007/978-3-030-88007-1_21 http://dx.doi.org/10.1007/978-3-030-88007-1_21 ]
Loper M , Mahmood N , Romero J , Pons-Moll G and Black M J . 2015 . SMPL: a skinned multi-person linear model . ACM Transactions on Graphics , 34 ( 6 ): # 248 [ DOI: 10.1145/2816795.2818013 http://dx.doi.org/10.1145/2816795.2818013 ]
Ma Q L , Yang J L , Tang S Y and Black M J . 2021 . The power of points for modeling humans in clothing // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 10954 - 10964 [ DOI: 10.1109/ICCV48922.2021.01079 http://dx.doi.org/10.1109/ICCV48922.2021.01079 ]
Mao A H , Zhuo G J and Xuan J . 2022 . Deformable 3D clothed humans reconstruction by a multi-level topological graph convolutional network . Journal of Computer-Aided Design and Computer Graphics , 34 ( 12 ): 1899 - 1910
毛爱华 , 禚冠军 , 禤骏 . 2022 . 采用多级拓扑图卷积网络的可变形三维着装人体重建 . 计算机辅助设计与图形学学报 , 34 ( 12 ): 1899 - 1910 [ DOI: 10.3724/SP.J.1089.2022.19225 http://dx.doi.org/10.3724/SP.J.1089.2022.19225 ]
Moon G and Lee K M . 2020 . I2l-meshnet: image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single RGB image // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 752 - 768 [ DOI: 10.1007/978-3-030-58571-6_44 http://dx.doi.org/10.1007/978-3-030-58571-6_44 ]
Moon G , Nam H , Shiratori T and Lee K M . 2022 . 3D clothed human reconstruction in the wild // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 184 - 200 [ DOI: 10.1007/978-3-031-20086-1_11 http://dx.doi.org/10.1007/978-3-031-20086-1_11 ]
Natsume R , Saito S , Huang Z , Chen W K , Ma C Y , Li H and Morishima S . 2019 . SiCloPe: silhouette-based clothed people // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 4475 - 4485 [ DOI: 10.1109/CVPR.2019.00461 http://dx.doi.org/10.1109/CVPR.2019.00461 ]
Pu J C , Liu L , Fu X D , Liu L J and Huang Q S . 2022 . Clothing visual representation for 3d human reconstruction . Journal of Computer-Aided Design and Computer Graphics , 34 ( 3 ): 352 - 363
普骏程 , 刘骊 , 付晓东 , 刘利军 , 黄青松 . 2022 . 三维人体重建中的服装视觉信息表示 . 计算机辅助设计与图形学学报 , 34 ( 3 ): 352 - 363 [ DOI: 10.3724/SP.J.1089.2022.18921 http://dx.doi.org/10.3724/SP.J.1089.2022.18921 ]
Saito S , Huang Z , Natsume R , Morishima S , Li H and Kanazawa A . 2019 . PiFu: pixel-aligned implicit function for high-resolution clothed human digitization // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea(South) : IEEE: 2304 - 2314 [ DOI: 10.1109/ICCV.2019.00239 http://dx.doi.org/10.1109/ICCV.2019.00239 ]
Saito S , Simon T , Saragih J and Joo H . 2020 . PiFuHD: multi-level pixel-aligned implicit function for high-resolution 3d human digitization // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle, USA : IEEE: 81 - 90 [ DOI: 10.1109/CVPR42600.2020.00016 http://dx.doi.org/10.1109/CVPR42600.2020.00016 ]
Santesteban I , Otaduy M A and Casas D . 2022 . SNUG: self-supervised neural dynamic garments // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 8130 - 8140 [ DOI: 10.1109/CVPR52688.2022.00797 http://dx.doi.org/10.1109/CVPR52688.2022.00797 ]
Su Z Q , Wan W L , Yu T , Liu L J , Fang L , Wang W Pand Liu Y B . 2022 . MulayCap: multi-layer human performance capture using a monocular video camera . IEEE Transactions on Visualization and Computer Graphics , 28 ( 4 ): 1862 - 1879 [ DOI: 10.1109/TVCG.2020.3027763 http://dx.doi.org/10.1109/TVCG.2020.3027763 ]
Von Marcard T , Henschel R , Black M J , Rosenhahn B and Pons-Moll G . 2018 . Recovering accurate 3D human pose in the wild using Imus and a moving camera // Proceedings of the 15th European Conference on Computer Vision . Munich, Germany : Springer: 614 - 631 [ DOI: 10.1007/978-3-030-01249-6_37 http://dx.doi.org/10.1007/978-3-030-01249-6_37 ]
Xiu Y L , Yang J L , Tzionas D and Black M J . 2022 . ICON: implicit clothed humans obtained from normals // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 13286 - 13296 [ DOI: 10.1109/CVPR52688.2022.01294 http://dx.doi.org/10.1109/CVPR52688.2022.01294 ]
Xu A J and Zhou J . 2020 . Research on key technologies of virtual fitting system. Progress in Textile Science and . Technology , ( 3 ): 28 - 32
徐爱婧 , 周捷 . 2020 . 虚拟试衣系统关键技术研究 . 纺织科技进展 , ( 3 ): 28 - 32 [ DOI: 10.3969/j.issn.1673-0356.2020.03.009 http://dx.doi.org/10.3969/j.issn.1673-0356.2020.03.009 ]
Yang H , Chen R , An S P , Wei H and Zhang H . 2023 . The growth of image-related three dimensional reconstruction techniques in deep learning-driven era: a critical summary . Journal of Image and Graphics , 28 ( 8 ): 2396 - 2409
杨航 , 陈瑞 , 安仕鹏 , 魏豪 , 张衡 . 2023 . 深度学习背景下的图像三维重建技术进展综述 . 中国图象图形学报 , 28 ( 8 ): 2396 - 2409 [ DOI: 10.11834/jig.220376 http://dx.doi.org/10.11834/jig.220376 ]
Yang W , Luo P and Lin L . 2014 . Clothing co-parsing by joint image segmentation and labeling // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Columbus, USA : IEEE: 3182 - 3189 [ DOI: 10.1109/CVPR.2014.407 http://dx.doi.org/10.1109/CVPR.2014.407 ]
Yoon J S , Ceylan D , Wang T Y , Lu J W , Yang J M , Shu Z X and Park H S . 2022 . Learning motion-dependent appearance for high-fidelity rendering of dynamic humans from a single camera // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 3397 - 3407 [ DOI: 10.1109/CVPR52688.2022.00340 http://dx.doi.org/10.1109/CVPR52688.2022.00340 ]
Yu T , Guo K W , Xu F , Dong Y , Su Z Q , Zhao J H , Li J G , Dai Q H and Liu Y B . 2017 . BodyFusion: real-time capture of human motion and surface geometry using a single depth camera // Proceedings of 2017 IEEE International Conference on Computer Vision . Venice, Italy : IEEE: 910 - 919 [ DOI: 10.1109/ICCV.2017.104 http://dx.doi.org/10.1109/ICCV.2017.104 ]
Yu T , Zheng Z R , Guo K W , Liu P P , Dai Q H and Liu Y B . 2021a . Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 5742 - 5752 [ DOI: 10.1109/CVPR46437.2021.00569 http://dx.doi.org/10.1109/CVPR46437.2021.00569 ]
Yu Z B , Wang J J , Xu J W , Ni B B , Zhao C L , Wang M S and Zhang W J . 2021b . Skeleton2Mesh: kinematics prior injected unsupervised human mesh recovery // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 8599 - 8609 [ DOI: 10.1109/ICCV48922.2021.00850 http://dx.doi.org/10.1109/ICCV48922.2021.00850 ]
Zhang H W , Lin S Y , Shao R Z , Zhang Y X , Zheng Z R , Huang H , Guo Y D and Liu Y B . 2023 . CloSET: modeling clothed humans on continuous surface with explicit template decomposition // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 501 - 511 [ DOI: 10.1109/CVPR52729.2023.00056 http://dx.doi.org/10.1109/CVPR52729.2023.00056 ]
Zhang H W , Tian Y T , Zhou X C , Ouyang W L , Liu Y B , Wang L M and Sun Z N . 2021 . PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 11426 - 11436 [ DOI: 10.1109/ICCV48922.2021.01125 http://dx.doi.org/10.1109/ICCV48922.2021.01125 ]
Zheng Z R , Yu T , Liu Y B and Dai Q H . 2022 . PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction . IEEE Transactions on Pattern Analysis and Machine Intelligence , 44 ( 6 ): 3170 - 3184 [ DOI: 10.1109/TPAMI.2021.3050505 http://dx.doi.org/10.1109/TPAMI.2021.3050505 ]
Zins P , Xu Y L , Boyer E , Wuhrer S and Tung T . 2021 . Data-driven 3D reconstruction of dressed humans from sparse views // Proceedings of 2021 International Conference on 3D Vision . London, UK : IEEE: 494 - 504 [ DOI: 10.1109/3DV53792.2021.00059 http://dx.doi.org/10.1109/3DV53792.2021.00059 ]
相关文章
相关作者
相关机构
京公网安备11010802024621