Current Issue Cover
二维人体姿态编解码方法综述: 从解决歧义性问题的角度出发

喻莉, 杜聪炬, 闫增强, 赵慧娟, 何双江(华中科技大学电子信息与通信学院)

摘 要
人体姿态估计在娱乐、健康、安全等领域为众多应用提供了关键技术支持。人体姿态编解码的目的在于从原始输入数据中提取特征,将其构建为更易处理和理解的中间表示形式,并从中恢复出可理解的人体姿态。然而,实际场景中受到光照、运动模糊、遮挡、复杂姿态、拍摄视角和图像分辨率等因素的影响,人体姿态估计常常受到分布歧义、尺度歧义和关联歧义等问题的困扰。因此,合理的编解码设计是解决人体姿态估计各类歧义性问题的关键。首先,对人体姿态建模方法进行介绍,其是实现人体姿态编解码的前提条件。然后,针对分布歧义问题,从基于分布约束、基于结构约束和基于迭代约束三个方面进行介绍;尺度歧义问题被划分为关键点尺度歧义和像素尺度歧义问题,并介绍与之相关的基于尺度表征、基于无偏变换和基于积分回归的方法;针对关联歧义问题,归纳包括基于图优化、基于肢体向量、基于实例中心和基于参考标签的四类人体姿态编解码方法。同时,对各方法的性能进行了总结分析。最后,对未来人体姿态编解码的研究方向进行了展望。
关键词
Review of 2D human pose encoding and decoding methods: from the perspective of ambiguity mitigation

Yu Li, Du Congju, Yan Zengqiang, Zhao Huijuan, He Shuangjiang(School of Electronic Information and Communications,Huazhong University of Science and Technology)

Abstract
Within the various subfields of computer vision, human pose estimation stands out as a captivating area of research. It aims to precisely localize body parts or keypoints of the human instance from a given image or video, and reconstruct the skeleton structure of the human body. Human pose estimation offers technical support for various applications, such as human pose tracking, human action recognition, person re-identification, human-object interactions, and person image generation. Its uses span across entertainment (such as virtual reality, augmented reality, and animation), health (such as healthcare and sports), and security (such as surveillance). Consequently, high-performance and real-time human pose estimation have emerged as prominent focus areas in current computer vision research. In recent years, there has been extensive research on human pose estimation methods. A part of the research centers on developing and refining high-performance or lightweight network architectures. Notable examples include Hourglass, SimpleBaseline, HRNet, and Lite-HRNet. These architectures have found broad utility in various visual tasks, including object detection and instance segmentation. Another facet of research is dedicated to introducing innovative pose encoding and decoding schemes. These novel schemes are intended to construct human pose estimation models that are more accurate and robust. The encoding and decoding process for human pose estimation represents a pivotal stage in extracting features from the input data and translating this information into comprehensible human poses. The encoding process primarily involves extracting features from the initial input data and molding them into an intermediate representation. This intermediate form, which could be feature maps or latent vectors, simplifies processing and comprehension; the subsequent decoding process retrieves the ultimate human pose from this encoded structure. Despite the considerable progress made in current research on human pose estimation, the issue of ambiguity remains a significant obstacle in real-world scenarios. Diverse poses might be mapped to similar or overlapping low-dimensional representations, primarily due to variables such as illumination, motion blur, occlusions, complex poses, perspective, and resolution, etc. This results in ambiguity and uncertainty in the resultant poses, constituting the ambiguity challenge in human pose estimation. This challenge encompasses distributive, scale, and associative ambiguity. To illustrate, in scenarios where a hand is obscured, the precise location of the wrist becomes uncertain, thus yielding distributive ambiguity. Secondly, when the camera is positioned farther from the human instance, the scale of the body in the image diminishes, often making it tough to ascertain the accurate scale without ample contextual details, leading to scale ambiguity. Thirdly, when two human instances obscure each other, precisely assigning the identified keypoints to corresponding human instances becomes intricate, giving rise to associative ambiguity. The well-designed methods for encoding and decoding human poses enable the modeling and solving of human pose estimation in a suitable manner. These methods provide effective optimization objectives and feature representations for the model, allowing for the construction of more reasonable and robust human pose estimation models. Therefore, investigating encoding and decoding for human pose estimation carries substantial importance for research. The majority of past review papers on human pose estimation have primarily focused on the design of network structures, while the ambiguity problem can significantly influence the performance of human pose estimation. Our objective is to provide a summarized analysis of the current research on the pose encoding and decoding methods. This will encompass a thorough investigation of the inherent ambiguity challenge associated with human pose estimation. In this paper, we first introduce the human pose modeling techniques, which directly impact the potential for expressive human pose representation. Secondly, the pose encoding and decoding methods are categorized into distributive, scale, and associative ambiguity. To address distributive ambiguity, three strategies are explored: distributive, structural, and iterative constraints. The scale ambiguity is further refined into the keypoint- and pixel-wise scale ambiguity problem. The former is mainly addressed through representative-based methods, and the latter can be solved using unbiased and integral-based methods. For associative ambiguity, possible approaches can be categorized into four groups: graph-, limb-, center-, and embedding-based methods. These diverse methods provide multiple potential solutions for dealing with associative ambiguity. Then, we provide a summary and performance comparison of the methods used for encoding and decoding for human pose, which helps in understanding the strengths and limitations of each approach. Lastly, potential directions for future development are elucidated. The purpose of this paper is to establish a novel research trajectory for researchers: addressing the ambiguity problem in human pose estimation through encoding and decoding. The resolution of ambiguity challenges in human pose estimation is expected to broaden its potential applications.
Keywords

订阅号|日报