发布时间: 2018-06-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.170490 2018 | Volume 23 | Number 6 图像理解和计算机视觉

 收稿日期: 2017-09-08; 修回日期: 2018-01-06 基金项目: 国家自然科学基金项目（61572083）；中央高校基本科研业务费专项基金项目（310824171003）；陕西省自然科学基金项目（2017JQ6064） 第一作者简介: 孙士杰(1989-), 男, 长安大学交通信息工程及控制专业博士研究生, 主要研究方向为3维点云目标识别、机器视觉、跟踪、人工智能。E-mail:shijieSun@chd.edu.cn. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)06-0866-08

# 关键词

RGB-D相机; 自动标定; 3维点云; 平面检测; 深度图

Automatic extrinsic calibration for RGB-D camera based on ground plane detection in point cloud
Sun Shijie, Song Huansheng, Zhang Chaoyang, Zhang Wentao, Wang Xuan
Chang'an University, Xi'an 710064, China
Supported by: National Natural Science Foundation of China(61572083); Fundamental Research Founds for the Central Universities(310824171003); Natural Science Foundation of Shaanxi Province, China(2017JQ6064)

# Abstract

Objective The extrinsic parameter of the RGB-D camera can be used to convert point cloud in camera coordinate to that in world coordinate. This parameter can be applied to 3D reconstruction, 3D measurement, robot gesture estimation, and target detection, among others. The RGB-D camera (e.g., Kinect, PrimeSense, and RealSense) consists of two sensors: RGB sensor and depth sensor. The former sensor retrieves the RGB image, whereas the latter retrieves depth image from the scene. To translate the 3D point cloud in the camera coordinate to the world coordinate, the extrinsic parameters of depth sensor have to be calibrated. The general calibration methods use calibration objects (such as chessboard) to obtain the extrinsic parameter of the RGB-D color sensor, which is regarded as the extrinsic parameter of the depth sensor approximately. These methods do not make full use of depth information, thereby causing difficulty in simplifying the calibration process. Moreover, lack of knowledge on the difference between the depth sensor and color sensor can cause large errors. Thus, to accurately estimate the extrinsic parameter of the depth sensor in the RGB-D camera, some methods have been proposed by using the extrinsic parameters of depth sensor relative to the color sensor. However, these methods complicate the calibration process. To simplify the calibration process of the extrinsic parameter of the depth sensor, the depth information should be fully utilized. Results of the methods are based on the color image with the parameter of the color sensor. However, the majority of applications on the RGB-D camera are based on the depth sensor. Moreover, parameters of the depth sensor should be directly calibrated. Method We build the spatial constraint relation between the ground plane and the camera, which can be used to select the ground plane from planes detected in the 3D point cloud. The ground plane should satisfy the following conditions: 1) The angle between the z axis of the camera and the ground plane is less than the specified threshold. 2) The z value of the ground plane in the world coordinate is larger than that of the other points, which are not in the ground plane. Moreover, we create the world coordinate based on the detected ground planes automatically. The origin point of the world coordinate is the projection of the origin point of the camera coordinate to the plane and the y axis of the world coordinate is the projection of the z axis of the camera coordinate to the plane. In addition, the direction of the z axis of the world coordinate is toward the origin point of the world coordinate from the origin point of the camera coordinate. We calibrate the extrinsic parameter of the RGB-D camera in the following steps. First, we reconstruct the 3D point cloud from the depth image retrieved from the depth sensor of the RGB-D camera. The reconstructed 3D point cloud is in the camera coordinate, whose subset forms a large number of planes. Second, planes in the 3D point cloud are detected by the MELSAC method. At most, one ground plane exists in the detected planes. Third, the spatial constraint rule between the ground plane and camera is built. The detected planes are filtered by the spatial constraint rule until the ground plane is found or all the planes are iterated. The process stops when a ground plane cannot be found. Finally, by using the relation between the ground plane and the camera, point sets are selected to calculate the extrinsic parameters. Result In the experiment, the benchmark is the result of checkerboard extrinsic calibration method only processing the RGB image of RGB-D information which is retrieved from the PrimeSense camera. We record an 89.4 s video and use it in the experiment. The videos contain two sub-videos: RGB video whose channel is 3 and depth video whose channel is 1. A 7x7 checkerboard is found in every frame of the RGB video, which is processed by the checkerboard-based method. The input of our proposed method is the frame of the depth video. The result shows that the average tilt angle error is -1.14°, the average pitch angle error is 4.57°, and the average camera height error is 3.96 cm. An experiment to test the robustness of the noise is also performed. The variance of the Gaussian noise in the depth frame is increased, and the result of each variance Gaussian noise is obtained. The stability of calibration decreases with the increase in the variance of Gaussian noise. The result shows that our method performs effectively when the variance of the Gaussian noise is below 0.01. Conclusion Our proposed method fully utilizes the depth information of the RGB-D camera, and simplifies the process of extrinsic calibration of the depth sensor. Thus, our method can be used in actual application. For convenience, the source code is also published. This method can automatically detect the ground plane and does not require other calibration objects. Therefore, the proposed method can calibrate each frame of the recorded video accurately, and it is not sensitive to the noise in the depth image. In addition, the algorithm has high parallelism. The process of estimating planes in the 3D point cloud and filtering these planes can be implemented in a parallel manner. The proposed method will have real-time performance based on this parallel implementation.

# Key words

RGB-D camera; automatic extrinsic calibration; 3D point cloud; plane detection; depth map

# 0 引言

RGB-D相机, 比如Kinect[1-3]、PrimeSense、Asus Xtion Pro等, 是在传统的RGB相机的基础上增加了红外相机, 图 1是Kinect的结构图, 可见, 它内部包含：彩色相机、红外相机和红外照明器；彩色相机输出彩色图, 红外相机输出深度图, 红外照明器发射红外光, 用于深度图的计算.随着Kinect等低成本的RGB-D相机的出现, 它们越来越多地被应用在3维场景重建和导航[4-5]、目标识别和跟踪[6-7]、3维测量等应用中.这些应用往往需要对相机的外参进行标定。

# 1 外参标定过程

RGB-D相机包含彩色相机和红外相机, 本文主要针对红外相机的外参进行标定, 后文出现的外参标定默认指红外相机的外参标定。相机的外参为

 $\mathit{\boldsymbol{T}}_{\rm{C}}^{\rm{W}} = \left[ {\begin{array}{*{20}{c}} {\mathit{\boldsymbol{R}}_{\rm{C}}^{\rm{W}}}&0\\ {\mathit{\boldsymbol{t}}_{\rm{C}}^{\rm{W}}}&1 \end{array}} \right]$ (1)

# 1.1 相机坐标系和世界坐标系的建立

1) 世界坐标系的原点为相机坐标系原点在地平面的投影点；

2) 世界坐标系的${{Y}_{\text{W}}}$轴为相机坐标系${{Z}_{\text{C}}}$轴在地面的投影；

3) 世界坐标系的${{Z}_{\text{W}}}$轴垂直于地平面向下；

4) 建立的为右手坐标系。

# 1.2 相机外参计算

 $\mathit{\boldsymbol{T}}_{\rm{C}}^{\rm{W}} = \mathit{\boldsymbol{T}}_{\rm{W}}^{{\rm{C}} - 1}$ (2)

 $\left\{ \begin{array}{l} \mathit{\boldsymbol{P}}_{\rm{W}}^{\left( 0 \right)} = \left[ {000} \right]\\ \mathit{\boldsymbol{P}}_{\rm{W}}^{\left( 1 \right)} = \left[ {100} \right]\\ \mathit{\boldsymbol{P}}_{\rm{W}}^{\left( 2 \right)} = \left[ {010} \right]\\ \mathit{\boldsymbol{P}}_{\rm{W}}^{\left( 3 \right)} = \left[ {001} \right] \end{array} \right.$ (3)

 $\mathit{\boldsymbol{T}}_{\rm{W}}^{\rm{C}} = \left[ \begin{array}{l} \frac{{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 1 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}}}{{{{\left\| {\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 1 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}} \right\|}_2}}}\;\;\;\;\frac{{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 2 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}}}{{{{\left\| {\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 2 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}} \right\|}_2}}}\;\;\;\;\frac{{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 3 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}}}{{{{\left\| {\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 3 \right)}} \right) - {{\left( {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right)}^{\rm{T}}}} \right\|}_2}}}\;\;\;\;\;{\bf{0}}\\ \frac{{\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}}}{{{{\left\| {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right\|}_2}}}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;1 \end{array} \right]$ (4)

 ${\mathit{\boldsymbol{n}}_{\rm{y}}} = \frac{{{\mathit{\boldsymbol{P}}_{{{\rm{Z}}_{\rm{c}}}}} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}}}{{{{\left\| {{\mathit{\boldsymbol{P}}_{{{\rm{Z}}_{\rm{c}}}}} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right\|}_2}}}$ (8)

${{\mathit{\boldsymbol{P}}}_{{{\text{Z}}_{\text{c}}}}}$为相机坐标系的[001]点到平面的投影。通过解方程, 可得投影点。

 $\left\{ \begin{array}{l} ax + by + cz + d = 0\\ \frac{{x - {x_0}}}{a} = \frac{{y - {y_0}}}{b} = \frac{{z - {z_0}}}{c} \end{array} \right.$ (9)

 $\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 1 \right)} = \frac{{\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 2 \right)} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)} \times \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 3 \right)} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}}}{{{{\left\| {\mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 2 \right)} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)} \times \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 3 \right)} - \mathit{\boldsymbol{P}}_{\rm{C}}^{\left( 0 \right)}} \right\|}_2}}}$ (10)

# 2.1 将深度图转换为相机坐标系下的3维点云

4) 遍历深度视频的每帧, 利用本文所述方法检测地平面, 并估计出相机的外参, 得到每帧深度视频的相机姿势(${{H}_{\text{plane}}} $${{\phi }_{\text{plane}}}$$ {{\theta }_{\text{plane}}}$), 计算每帧视频的相机高度差($\Delta H$)、侧倾角差($\Delta \theta$)、偏航角差($\Delta \phi$), 计算公式为

 $\left\{ \begin{array}{l} \Delta H = {H_{{\rm{chessboard}}}} - {H_{{\rm{plane}}}}\\ \Delta \phi = {\phi _{{\rm{chessboard}}}} - {\phi _{{\rm{plane}}}}\\ \Delta \theta = {\theta _{{\rm{chessboard}}}} - {\theta _{{\rm{plane}}}} \end{array} \right.$ (13)

Table 1 Parameters of video

 彩色视频 深度视频 视频大小/像素 320×240 320×240 帧速率/(帧/s) 30 30 时长/s 89.400 89.400 像素字节/byte 24 16 视频格式 RGB24 Mono16

Table 2 Table of extrinsic parameters error of camera

 θ ϕ H 最大 7.48 4.89 15.22 最小 -0.75 -9.62 -1.14 平均 15.22 -3.86 3.96

# 3.2 误差分析

1) 角点检测的量化误差。由于使用的视频帧是240×320像素, 因此在计算相机内参和外参的时候, 因角点检测的量化误差较为严重的；

2) 棋盘方法测量的RGB-D相机的彩色传感器的外参, 而本方法测量则是RGB-D相机的红外相机的外参；

3) RGB-D相机扫描场景数据的噪声, 会影响地平面的检测, 进而影响相机外参的精度；

4) 地平面检测中MLESAC迭代停止的参数也会影响本实验的结果。

# 3.3 场景噪声对地平面检测的影响分析

RGB-D相机有诸多噪声来源(如：温度、环境光入射角度和光强、纹理等[13]), 本文采用的MLESAC可以处理微小噪声, 但不能够处理噪声过多或者数据缺失过多的场景：

1) 强阳光会造成地平面噪声点过多, 导致平面参数估计不准确, 相机的高度、侧倾角产生轻微影响；

2) 地面的反射率过高(比如：地面上放置一个镜子), 会造成该区域的数据缺失, 若缺失数据过多, 则会引起无法检测地平面, 导致无法估计出相机的外参。

# 参考文献

• [1] Microsoft. Kinect for X-box 360[EB/OL]. 2017-01-01[2017-10-12]. http://www.xbox.com/en-US/xbox-one/accessories/kinect.
• [2] WikiPedia. Kinect[EB/OL]. 2018-01-12[2017-08-29]. https://en.wikipedia.org/wiki/Kinect.
• [3] Barak F, Alexander S, Meir M, et al. Depth mapping using projected patterns: US, US8493496B2[P]. 2013-07-23.
• [4] Endres F, Hess J, Sturm J, et al. 3-D mapping with an RGB-D camera[J]. IEEE Transactions on Robotics, 2014, 30(1): 177–187. [DOI:10.1109/TRO.2013.2279412]
• [5] Labbé M, Michaud F. Online global loop closure detection for large-scale multi-session graph-based SLAM[C]//Proceedings of 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, IL, USA: IEEE, 2014: 2661-2666. [DOI:10.1109/IROS.2014.6942926]
• [6] Munaro M, Basso F, Menegatti E. OpenPTrack:Open source multi-camera calibration and people tracking for RGB-D camera networks[J]. Robotics and Autonomous Systems, 2016, 75: 525–538. [DOI:10.1016/j.robot.2015.10.004]
• [7] Munaro M, Menegatti E. Fast RGB-D people tracking for service robots[J]. Autonomous Robots, 2014, 37(3): 227–242. [DOI:10.1007/s10514-014-9385-0]
• [8] Liao Q H, Liu M, Tai L, et al. Extrinsic calibration of 3D range finder and camera without auxiliary object or human intervention[J]. arXiv: 1703.04391, 2017. http://arxiv.org/abs/1703.04391
• [9] Basso F, Menegatti E, Pretto A. Robust intrinsic and extrinsic calibration of RGB-D cameras[J]. arXiv: 1701.05748, 2017. http://arxiv.org/abs/1701.05748
• [10] Zhang Z. A flexible new technique for camera calibration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11): 1330–1334. [DOI:10.1109/34.888718]
• [11] Li S B, Zhuo Q. A new approach to calibrate range image and color image from Kinect[C]//Proceedings of the 4th International Conference on Intelligent Human-Machine Systems and Cybernetics. Nanchang, Jiangxi, China: IEEE, 2012, 2(24): 252-255. [DOI:10.1109/IHMSC.2012.156]
• [12] Torr P H S, Zisserman A. MLESAC:a new robust estimator with application to estimating image geometry[J]. Computer Vision and Image Understanding, 2000, 78(1): 138–156. [DOI:10.1006/cviu.1999.0832]
• [13] Belhedi A, Bartoli A, Bourgeois S, et al. Noise modelling in time-of-flight sensors with application to depth noise removal and uncertainty estimation in three-dimensional measurement[J]. IET Computer Vision, 2015, 9(6): 967–977. [DOI:10.1049/iet-cvi.2014.0135]
• [14] Zhang Z. A flexible new technique for camera calibration[J]. IEEE Transactions on pattern analysis and machine intelligence, 2000, 22(11): 1330–4. [DOI:10.1109/34.888718]