深度学习二维人体姿态估计方法综述

孔英会; 秦胤峰; 张珂

doi:10.11834/jig.220436

综述 | 浏览量 : 0 下载量: 0 CSCD: 1

PDF
导出
分享
收藏
专辑

深度学习二维人体姿态估计方法综述
Deep learning based two-dimension human pose estimation： a critical analysis
2023年28卷第7期页码：1965-1989
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.220436
稿件说明：

移动端阅览

孔英会，秦胤峰，张珂. 2023. 深度学习二维人体姿态估计方法综述. 中国图象图形学报， 28(07):1965-1989

Kong Yinghui， Qin Yinfeng， Zhang Ke. 2023. Deep learning based two-dimension human pose estimation： a critical analysis. Journal of Image and Graphics， 28(07):1965-1989
孔英会，秦胤峰，张珂. 2023. 深度学习二维人体姿态估计方法综述. 中国图象图形学报， 28(07):1965-1989 DOI： 10.11834/jig.220436.

Kong Yinghui， Qin Yinfeng， Zhang Ke. 2023. Deep learning based two-dimension human pose estimation： a critical analysis. Journal of Image and Graphics， 28(07):1965-1989 DOI： 10.11834/jig.220436.

摘要

人体姿态估计是计算机视觉中的一项重要任务。传统的姿态估计方法存在难以实现复杂场景下分离目标和背景、易受人为设定先验信息影响、效率过低等问题。随着人工智能技术的发展，深度学习技术日趋成熟，基于深度学习的人体姿态估计方法的精确率和速度等性能均优于传统的人体姿态估计方法。近年来，作为三维人体姿态估计的基础，二维人体姿态估计模型在解决拥挤和遮挡方面取得了长足进步，但大多数网络模型采用的是层数过多的卷积神经网络（convolutional neural network，CNN）模型，对网络速度产生了很大影响。基于部署在边缘侧的实际应用需求，二维人体姿态估计网络的轻量化成为研究热点，且具有潜在的创新应用价值。根据基于深度学习的二维人体姿态估计模型的发展历程和优化趋势，可将其分为单人姿态估计、多人姿态估计以及轻量级人体姿态估计3类。本文对各类人体姿态估计采用的不同卷积神经网络模型进行总结，对各类神经网络模型的特点进行分析，对各类估计方法的性能进行比较。虽然深度卷积神经网络（deep convolutional neural network， DCNN）模型的结构设计越来越多元化，但是各类深度学习网络模型在处理人体姿态估计任务时，仍具有一定的局限性。本文对二维人体姿态估计模型采用的技术方法及其存在的问题进行深入讨论，并给出了未来可能的研究方向。

Abstract

Computer vision-oriented human pose estimation is focused on location of human skeleton in image or video， in which pose information can be used for pose estimation or a specific pose or action-objective location analysis in terms of the position relationship between the key areas of the human body. Nowadays， human pose estimation-oriented action recognition and pose tracking have been developing intensively. Conventional pose estimation methods can be segmented into two categories of object detection and pose estimation. The object detection analysis is based on segmentation， matching， or statistical learning， which is challenged for targets and backgrounds clarification in complex scenarios and it is still vulnerable for prior information. Additionally， it is time-consuming and labor-intensive to construct training sample libraries and classifiers. The pose estimation analysis is in relevance to model-based or non-model-based methods， which is challenged for object detection-derived error extension and much more artificial constraint information. Nevertheless， its efficiency is still to be optimized farther. The emerging artificial intelligence （AI） based deep learning technique has its potentials for the recognition precision and speed of the deep learning-based human pose estimation methods to a certain extent. Generally， human pose estimation can be divided into two-dimensional and three-dimensional human pose estimation. For three-dimensional human pose estimation， two-dimensional human pose estimation model is beneficial for dealing with the crowding and occlusion situations. However， most network models are originated from convolutional neural network （CNN） models and it is challenged for depth-loaded network speed. Lightweight two-dimensional human pose estimation networks are concerned more for edge measurement deployment. We review the development process and optimization trend of the two-dimensional human pose estimation model based on deep learning literately. They can be divided into three categories： single-person pose estimation， multi-person pose estimation， and lightweight human pose estimation. Single-person pose estimation is the basis of multi-person pose estimation， which can be divided into methods based on keypoints regression and heatmap detection， and there is a trend to combine these two methods to achieve single-person pose estimation. Overall， multi-person pose estimation network model can be divided into top-down， bottom-up， and others. The precision of the top-down network model is higher， but the time efficiency is not satisfactory， especially for the crowded problem-related input data. The number of human bodies is larger in the input data， the estimation time is much more longer of network model. The precision of bottom-up network model has shrunk in small range， but the efficiency is greatly improved. Moreover， time consumption of network model is used and the human pose-estimated is independent of the number of human bodies in the input data. These two methods are actually as a dual method. Initially， to locate the position of the human body in the input data， top-down pose estimation method is focused on the body detector， and then pose estimation is performed for each sample. Specifically， some top-down methods need to crop single-person body accurately and adjust it to the central position of the input data for each estimation. The bottom-up approach is oriented to get all body keypoints in the input data and these keypoints are assigned to the objects. At the same time， the appearance of single-stage network also means that researchers need to pay more attention to the computational cost of network model. A small number of networks have combined with top-down and bottom-up methods together， and it has achieved good results. We summarize multiple CNN models used in various human pose estimations， analyze the characteristics of various neural network models， and compare the performance of various pose estimation methods. It can be seen that the structural design of deep convolutional neural network models is becoming more and more diverse， but various deep learning network models still have certain limitations when dealing with human pose estimation tasks. The technical methods adopted by the two-dimensional human pose estimation models and its existing problems are discussed， and possible future research directions are predicted. Our recommendation is aware to improve existing two-dimensional pose estimation network model for the pre-processing of input data on such aspects mentioned below： the clarity of the input data directly affects the pose estimation results， and effective image or video pre-processing methods may become a new idea to improve the precision and efficiency of pose estimation. The existing pose estimation methods are mostly via video data-cut static video frames. In essence， it is still restricted by image data pose estimation. Current real-time pose estimation of video data is essential for the application of pose tracking and action recognition. Nowadays， a few methods have been proposed to combine deep learning based pose estimation method in related to time domain information， such as optical flow， pose flow and long short-term memory. The images involved in the actual application are still to be developed on the aspects of more crowded and more serious occlusion， so they are still to be resolved and optimized. Recent pose estimation network models are improved through lightweight methods. Lightweight methods have its potentials and it can be as one of the key directions for pose estimation.

关键词

深度学习人体姿态估计模型结构模型优化轻量化

Keywords

deep learninghuman pose estimationmodel structuremodel optimizationlightweight

references

Ali K， Fleuret F， Hasler D and Fua P. 2009. Joint pose estimator and feature learning for object detection//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto， Japan： IEEE： 1373-1380 ［DOI： 10.1109/ICCV.2009.5459304http://dx.doi.org/10.1109/ICCV.2009.5459304］

Bajpai R and Joshi D. 2021. MoveNet： a deep neural network for joint profile prediction across variable walking speeds and slopes. IEEE Transactions on Instrumentation and Measurement， 70： #2508511 ［DOI： 10.1109/TIM.2021.3073720http://dx.doi.org/10.1109/TIM.2021.3073720］

Belagiannis V and Zisserman A. 2017. Recurrent human pose estimation//Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition. Washington， USA： IEEE： 468-475 ［DOI： 10.1109/FG.2017.64http://dx.doi.org/10.1109/FG.2017.64］

Cai Z D， Ying N， Guo C S， Guo R and Yang P. 2021. Research on multiperson pose estimation combined with YOLOv3 pruning model. Journal of Image and Graphics， 26（4）： 837-846

蔡哲栋，应娜，郭春生，郭锐，杨鹏. 2021. YOLOv3剪枝模型的多人姿态估计. 中国图象图形学报， 26（4）： 837-846 ［DOI： 10.11834/jig.200138http://dx.doi.org/10.11834/jig.200138］

Cao Z， Simon T， Wei S E and Sheikh Y. 2017. Realtime multi-person 2D pose estimation using part affinity fields//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1302-1310 ［DOI： 10.1109/CVPR.2017.143http://dx.doi.org/10.1109/CVPR.2017.143］

Carreira J， Agrawal P， Fragkiadaki K and Malik J. 2016. Human pose estimation with iterative error feedback//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 4733-4742 ［DOI： 10.1109/CVPR.2016.512http://dx.doi.org/10.1109/CVPR.2016.512］

Chen Y， Shen C H， Wei X S， Liu L Q and Yang J. 2017. Adversarial PoseNet： a structure-aware convolutional network for human pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 1221-1230 ［DOI： 10.1109/ICCV.2017.137http://dx.doi.org/10.1109/ICCV.2017.137］

Chen Y L， Wang Z C， Peng P X， Zhang Z Q， Yu G and Sun J. 2018. Cascaded pyramid network for multi-person pose estimation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7103-7112 ［DOI： 10.1109/CVPR.2018.00742http://dx.doi.org/10.1109/CVPR.2018.00742］

Cheng B W， Xiao B， Wang J D， Shi H H， Huang T S and Zhang L. 2020. HigherHRNet： scale-aware representation learning for bottom-up human pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 5385-5394 ［DOI： 10.1109/CVPR42600.2020.00543http://dx.doi.org/10.1109/CVPR42600.2020.00543］

Dang Q， Yin J Q， Wang B and Zheng W Q. 2019. Deep learning based 2D human pose estimation： a survey. Tsinghua Science and Technology， 24（6）： 663-676 ［DOI： 10.26599/TST.2018.9010100http://dx.doi.org/10.26599/TST.2018.9010100］

Fang H S， Xie S Q， Tai Y W and Lu C W. 2017. RMPE： regional multi-person pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2353-2362 ［DOI： 10.1109/ICCV.2017.256http://dx.doi.org/10.1109/ICCV.2017.256］

He K M， Gkioxari G， Dollár P and Girshick R. 2017. Mask R-CNN//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2980-2988 ［DOI： 10.1109/ICCV.2017.322http://dx.doi.org/10.1109/ICCV.2017.322］

Insafutdinov E， Pishchulin L， Andres B， Andriluka M and Schiele B. 2016. DeeperCut： a deeper， stronger， and faster multi-person pose estimation model//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 34-50 ［DOI： 10.1007/978-3-319-46466-4_3http://dx.doi.org/10.1007/978-3-319-46466-4_3］

Iqbal U and Gall J. 2016. Multi-person pose estimation with local joint-to-person associations//Proceedings of 2016 European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 627-642 ［DOI： 10.1007/978-3-319-48881-3_44http://dx.doi.org/10.1007/978-3-319-48881-3_44］

Ke L P， Chang M C， Qi H G and Lyu S W. 2018. Multi-scale structure-aware network for human pose estimation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 731-746 ［DOI： 10.1007/978-3-030-01216-8_44http://dx.doi.org/10.1007/978-3-030-01216-8_44］

Kendall A， Grimes M and Cipolla R. 2015. PoseNet： a convolutional network for real-time 6-DOF camera relocalization//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago， Chile： IEEE： 2938-2946 ［DOI： 10.1109/ICCV.2015.336http://dx.doi.org/10.1109/ICCV.2015.336］

Kocabas M， Karagoz S and Akbas E. 2018. MultiPoseNet： fast multi-person pose estimation using pose residual network//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 437-453 ［DOI： 10.1007/978-3-030-01252-6_26http://dx.doi.org/10.1007/978-3-030-01252-6_26］

Kreiss S， Bertoni L and Alahi A. 2019. PifPaf： composite fields for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 11969-11978 ［DOI： 10.1109/CVPR.2019.01225http://dx.doi.org/10.1109/CVPR.2019.01225］

Li J F， Wang C， Zhu H， Mao Y H， Fang H S and Lu C W. 2019a. CrowdPose： efficient crowded scenes pose estimation and a new benchmark//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10855-10864 ［DOI： 10.1109/CVPR.2019.01112http://dx.doi.org/10.1109/CVPR.2019.01112］

Li W B， Wang Z C， Yin B Y， Peng Q X， Du Y M， Xiao T Z， Yu G， Lu H T， Wei Y C and Sun J. 2019b. Rethinking on multi-stage networks for human pose estimation ［EB/OL］. ［2022-05-17］. https://arxiv.org/pdf/1901.00148.pdfhttps://arxiv.org/pdf/1901.00148.pdf

Li Z， Ye J W， Song M L， Huang Y and Pan Z G. 2021. Online knowledge distillation for efficient pose estimation//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 11720-11730 ［DOI： 10.1109/iccv48922.2021.01153http://dx.doi.org/10.1109/iccv48922.2021.01153］

Liu Z， Zhu J K， Bu J J and Chen C. 2015. A survey of human pose estimation： the body parts parsing based methods. Journal of Visual Communication and Image Representation， 32： 10-19 ［DOI： 10.1016/j.jvcir.2015.06.013http://dx.doi.org/10.1016/j.jvcir.2015.06.013］

Luo Y， Ren J， Wang Z X， Sun W X， Pan J S， Liu J B， Pang J H and Lin L. 2018. LSTM pose machines//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5207-5215 ［DOI： 10.1109/CVPR.2018.00546http://dx.doi.org/10.1109/CVPR.2018.00546］

Lyu Z Z， Liu L， Fu X D， Liu L J and Huang Q S. 2022. Dual branch network for human pose estimation in dressing scene. Journal of Image and Graphics， 27（4）： 1110-1124

吕中正，刘骊，付晓东，刘利军，黄青松. 2022. 着装场景下双分支网络的人体姿态估计. 中国图象图形学报， 27（4）： 1110-1124 ［DOI： 10.11834/jig.200642http://dx.doi.org/10.11834/jig.200642］

Ma M. 2017. Study on Human Pose Estimation， Tracking and Human Action Recognition in Videos. Jinan： Shandong University

马淼. 2017. 视频中人体姿态估计、跟踪与行为识别研究. 济南：山东大学

Newell A， Huang Z and Deng J. 2017. Associative embedding： end-to-end learning for joint detection and grouping//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 2274-2284

Newell A， Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of the 14th European Conference on Computer Vision. Amsterdam， the Netherlands： Springer： 483-499 ［DOI： 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29］

Nie X C， Feng J S， Xing J L and Yan S C. 2017. Generative partition networks for multi-person pose estimation ［EB/OL］. ［2022-05-21］. https://arxiv.org/pdf/1705.07422.pdfhttps://arxiv.org/pdf/1705.07422.pdf

Nie X C， Feng J S， Zhang J F and Yan S C. 2019. Single-stage multi-person pose machines//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 6950-6959 ［DOI： 10.1109/ICCV.2019.00705http://dx.doi.org/10.1109/ICCV.2019.00705］

Nie X C， Feng J S， Zuo Y M and Yan S C. 2018. Human pose estimation with parsing induced learner//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2100-2108 ［DOI： 10.1109/CVPR.2018.00224http://dx.doi.org/10.1109/CVPR.2018.00224］

Ning G H， Zhang Z and He Z Q. 2018. Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Transactions on Multimedia， 20（5）： 1246-1259 ［DOI： 10.1109/TMM.2017.2762010http://dx.doi.org/10.1109/TMM.2017.2762010］

Niu S L， Ou W H， Feng S H， Gou J P， Long F， Zhang W C and Zeng W. 2021. Designing compact convolutional filters for lightweight human pose estimation. Wireless Communications and Mobile Computing， 2021： #1333250 ［DOI： 10.1155/2021/1333250http://dx.doi.org/10.1155/2021/1333250］

Osokin D. 2019. Real-time 2D multi-person pose estimation on CPU： lightweight OpenPose//Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. Prague， Czech Republic： SciTePress： 736-743

Papandreou G， Zhu T， Chen L C， Gidaris S， Tompson J and Murphy K. 2018. PersonLab： person pose estimation and instance segmentation with a bottom-up， part-based， geometric embedding model//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 282-299 ［DOI： 10.1007/978-3-030-01264-9_17http://dx.doi.org/10.1007/978-3-030-01264-9_17］

Papandreou G， Zhu T， Kanazawa N， Toshev A， Tompson J， Bregler C and Murphy K. 2017. Towards accurate multi-person pose estimation in the wild//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 3711-3719 ［DOI： 10.1109/CVPR.2017.395http://dx.doi.org/10.1109/CVPR.2017.395］

Parikh D and Zitnick C L. 2011. Finding the weakest link in person detectors//Proceedings of the CVPR 2011. Colorado Springs， USA： IEEE： 1425-1432 ［DOI： 10.1109/CVPR.2011.5995450http://dx.doi.org/10.1109/CVPR.2011.5995450］

Pishchulin L， Insafutdinov E， Tang S Y， Andres B， Andriluka M， Gehler P and Schiele B. 2016. DeepCut： joint subset partition and labeling for multi person pose estimation//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 4929-4937 ［DOI： 10.1109/CVPR.2016.533http://dx.doi.org/10.1109/CVPR.2016.533］

Pishchulin L， Jain A， Andriluka M， Thormählen T and Schiele B. 2012. Articulated people detection and pose estimation： reshaping the future//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 3178-3185 ［DOI： 10.1109/CVPR.2012.6248052http://dx.doi.org/10.1109/CVPR.2012.6248052］

Qi T， Bayramli， Ali U， Zhang Q C and Lu H T. 2019. Spatial shortcut network for human pose estimation ［EB/OL］. ［2022-05-17］. https://arxiv.org/pdf/1904.03141.pdfhttps://arxiv.org/pdf/1904.03141.pdf

Qin X F， Guo H Y， He C X and Zhang X D. 2022. Lightweight human pose estimation： CVC-net. Multimedia Tools and Applications， 81（13）： 17615-17637 ［DOI： 10.1007/s11042-022-12245-zhttp://dx.doi.org/10.1007/s11042-022-12245-z］

Ranjan A and Black M J. 2017. Optical flow estimation using a spatial pyramid network//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 4161-4170 ［DOI： 10.1109/CVPR.2017.291http://dx.doi.org/10.1109/CVPR.2017.291］

Ren H P， Wang W M， Zhang K X， Wei D J， Gao Y Y and Sun Y. 2021. Fast and lightweight human pose estimation. IEEE Access， 9： 49576-49589 ［DOI： 10.1109/ACCESS.2021.3069102http://dx.doi.org/10.1109/ACCESS.2021.3069102］

Sapp B， Weiss D and Taskar B. 2011. Parsing human motion with stretchable models//Proceedings of the CVPR 2011. Colorado Springs， USA： IEEE： 1281-1288 ［DOI： 10.1109/CVPR.2011.5995607http://dx.doi.org/10.1109/CVPR.2011.5995607］

Sekii T. 2018. Pose proposal networks//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 350-366 ［DOI： 10.1007/978-3-030-01261-8_21http://dx.doi.org/10.1007/978-3-030-01261-8_21］

Shen J， Liu G C， Chen J， Fang Y Q， Xie J B， Yu Y and Yan S C. 2014. Unified structured learning for simultaneous human pose estimation and garment attribute classification. IEEE Transactions on Image Processing， 23（11）： 4786-4798 ［DOI： 10.1109/TIP.2014.2358082http://dx.doi.org/10.1109/TIP.2014.2358082］

Shi J D， Wang J Z and Wang H R. 2008. Real-time detection method of human motion based on optical flow. Transactions of Beijing Institute of Technology， 28（9）： 794-797

施家栋，王建中，王红茹. 2008. 基于光流的人体运动实时检测方法. 北京理工大学学报， 28（9）： 794-797

Shotton J， Girshick R， Fitzgibbon A， Sharp T， Cook M， Finocchio M， Moore R， Kohli P， Criminisi A， Kipman A and Blake A. 2013. Efficient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence， 35（12）： 2821-2840 ［DOI： 10.1109/TPAMI.2012.241http://dx.doi.org/10.1109/TPAMI.2012.241］

Sun K， Lan C L， Xing J L， Zeng W J， Liu D and Wang J D. 2017. Human pose estimation using global and local normalization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 5600-5608 ［DOI： 10.1109/ICCV.2017.597http://dx.doi.org/10.1109/ICCV.2017.597］

Sun K， Xiao B， Liu D and Wang J D. 2019. Deep high-resolution representation learning for human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 5686-5696 ［DOI： 10.1109/CVPR.2019.00584http://dx.doi.org/10.1109/CVPR.2019.00584］

Tang W， Yu P and Wu Y. 2018. Deeply learned compositional models for human pose estimation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 197-214 ［DOI： 10.1007/978-3-030-01219-9_12http://dx.doi.org/10.1007/978-3-030-01219-9_12］

Tompson J， Goroshin R， Jain A， LeCun Y and Bregler C. 2015. Efficient object localization using convolutional networks//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 648-656 ［DOI： 10.1109/CVPR.2015.7298664http://dx.doi.org/10.1109/CVPR.2015.7298664］

Tompson J， Jain A， LeCun Y and Bregler C. 2014. Joint training of a convolutional network and a graphical model for human pose estimation//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 1799-1807

Toshev A and Szegedy C. 2014. DeepPose： human pose estimation via deep neural networks//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 1653-1660 ［DOI： 10.1109/CVPR.2014.214http://dx.doi.org/10.1109/CVPR.2014.214］

Valentin B， Grishchenko I， Raveendran K， Zhu T， Zhang F and Grundmann M. 2020. BlazePose： on-device real-time body pose tracking ［EB/OL］. ［2022-05-17］. https://arxiv.org/pdf/2006.10204.pdfhttps://arxiv.org/pdf/2006.10204.pdf

Wang J D， Sun K， Cheng T H， Jiang B R， Deng C R， Zhao Y， Liu D， Mu Y D， Tan M K， Wang X G， Liu W Y and Xiao B. 2021. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 43（10）： 3349-3364 ［DOI： 10.1109/TPAMI.2020.2983686http://dx.doi.org/10.1109/TPAMI.2020.2983686］

Wei S E， Ramakrishna V， Kanade T and Sheikh Y. 2016. Convolutional pose machines//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 4724-4732 ［DOI： 10.1109/CVPR.2016.511http://dx.doi.org/10.1109/CVPR.2016.511］

Xia F T， Wang P， Chen X J and Yuille A L. 2017. Joint multi-person pose estimation and semantic part segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 6080-6089 ［DOI： 10.1109/CVPR.2017.644http://dx.doi.org/10.1109/CVPR.2017.644］

Xiao B， Wu H P and Wei Y C. 2018. Simple baselines for human pose estimation and tracking//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 472-487 ［DOI： 10.1007/978-3-030-01231-1_29http://dx.doi.org/10.1007/978-3-030-01231-1_29］

Xiu Y L， Li J F， Wang H Y， Fang Y H and Lu C W. 2018. Pose flow： efficient online pose tracking//Proceedings of the British Machine Vision Conference 2018. Newcastle， UK： BMVA Press： #53

Yang W， Li S， Ouyang W L， Li H S and Wang X G. 2017. Learning feature pyramids for human pose estimation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 1290-1299 ［DOI： 10.1109/ICCV.2017.144http://dx.doi.org/10.1109/ICCV.2017.144］

Yang X M， Zhou Y H， Zhang S R， Wu K W and Sun Y X. 2019. Human pose estimation based on cross-stage structure. Journal of Image and Graphics， 24（10）： 1692-1702

杨兴明，周亚辉，张顺然，吴克伟，孙永宣. 2019. 跨阶段结构下的人体姿态估计. 中国图象图形学报， 24（10）： 1692-1702 ［DOI： 10.11834/jig.190028http://dx.doi.org/10.11834/jig.190028］

Yu C Q， Xiao B， Gao C X， Yuan L， Zhang L， Sang N and Wang J D. 2021. Lite-HRNet： a lightweight high-resolution network//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 10435-10445 ［DOI： 10.1109/CVPR46437.2021.01030http://dx.doi.org/10.1109/CVPR46437.2021.01030］

Zhang F， Zhu X T and Ye M. 2019a. Fast human pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3512-3521 ［DOI： 10.1109/CVPR.2019.00363http://dx.doi.org/10.1109/CVPR.2019.00363］

Zhang F， Zhu X T， Dai H B， Ye M and Zhu C. 2020. Distribution-aware coordinate representation for human pose estimation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 7091-7100 ［DOI： 10.1109/CVPR42600.2020.00712http://dx.doi.org/10.1109/CVPR42600.2020.00712］

Zhang F， Zhu X T and Wang C. 2021a. Single person pose estimation： a survey ［EB/OL］. ［2022-05-17］. https://arxiv.org/pdf/2109.10056.pdfhttps://arxiv.org/pdf/2109.10056.pdf

Zhang J， Chen Z and Tao D C. 2021b. Towards high performance human keypoint detection. International Journal of Computer Vision， 129（9）： 2639-2662 ［DOI： 10.1007/s11263-021-01482-8http://dx.doi.org/10.1007/s11263-021-01482-8］

Zhang P F， Lan C L， Xing J L， Zeng W J， Xue J R and Zheng N N. 2019b. View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence， 41（8）： 1963-1978 ［DOI： 10.1109/TPAMI.2019.2896631http://dx.doi.org/10.1109/TPAMI.2019.2896631］

Zhang X Y， Zhou X Y， Lin M X and Sun J. 2018. ShuffleNet： an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6848-6856［DOI： 10.1109/CVPR.2018.00716http://dx.doi.org/10.1109/CVPR.2018.00716］

Zhang Z， Tang J and Wu G S. 2019c. Simple and lightweight human pose estimation ［EB/OL］. ［2019-11-23］. https://arxiv.org/pdf/1911.10346.pdfhttps://arxiv.org/pdf/1911.10346.pdf

Zheng L， Huang Y J， Lu H C and Yang Y. 2019. Pose-invariant embedding for deep person re-identification. IEEE Transactions on Image Processing， 28（9）： 4500-4509 ［DOI： 10.1109/TIP.2019.2910414http://dx.doi.org/10.1109/TIP.2019.2910414］

Zuffi S， Freifeld O and Black M J. 2012. From pictorial structures to deformable structures//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence， USA： IEEE： 3546-3553 ［DOI： 10.1109/CVPR.2012.6248098http://dx.doi.org/10.1109/CVPR.2012.6248098］

文章被引用时，请邮件提醒。

提交