孙士杰,宋焕生,张朝阳,张文涛,王璇(长安大学, 西安 710064)
目的 RGB-D相机的外参数可以被用来将相机坐标系下的点云转换到世界坐标系的点云，可以应用在3维场景重建、3维测量、机器人、目标检测等领域。 一般的标定方法利用标定物（比如棋盘）对RGB-D彩色相机的外参标定，但并未利用深度信息，故很难简化标定过程，因此，若充分利用深度信息，则极大地简化外参标定的流程。基于彩色图的标定方法，其标定的对象是深度传感器，然而，RGB-D相机大部分则应用基于深度传感器，而基于深度信息的标定方法则可以直接标定深度传感器的姿势。方法 首先将深度图转化为相机坐标系下的3维点云，利用MELSAC方法自动检测3维点云中的平面，根据地平面与世界坐标系的约束关系，遍历并筛选平面，直至得到地平面，利用地平面与相机坐标系的空间关系，最终计算出相机的外参数，即相机坐标系内的点与世界坐标系内的点的转换矩阵。结果 实验以棋盘的外参标定方法为基准，处理从PrimeSense相机所采集的RGB-D视频流，结果表明，外参标定平均侧倾角误差为-1.14°，平均俯仰角误差为4.57°，平均相机高度误差为3.96 cm。结论 该方法通过自动检测地平面，准确估计出相机的外参数，具有很强的自动化，此外，算法具有较高地并行性，进行并行优化后，具有实时性，可应用于自动估计机器人姿势。
Automatic extrinsic calibration for RGB-D camera based on ground plane detection in point cloud
Sun Shijie,Song Huansheng,Zhang Chaoyang,Zhang Wentao,Wang Xuan(Chang'an University, Xi'an 710064, China)
Objective The extrinsic parameter of the RGB-D camera can be used to convert point cloud in camera coordinate to that in world coordinate. This parameter can be applied to 3D reconstruction, 3D measurement, robot gesture estimation, and target detection, among others. The RGB-D camera (e.g., Kinect, PrimeSense, and RealSense) consists of two sensors: RGB sensor and depth sensor. The former sensor retrieves the RGB image, whereas the latter retrieves depth image from the scene. To translate the 3D point cloud in the camera coordinate to the world coordinate, the extrinsic parameters of depth sensor have to be calibrated. The general calibration methods use calibration objects (such as chessboard) to obtain the extrinsic parameter of the RGB-D color sensor, which is regarded as the extrinsic parameter of the depth sensor approximately. These methods do not make full use of depth information, thereby causing difficulty in simplifying the calibration process. Moreover, lack of knowledge on the difference between the depth sensor and color sensor can cause large errors. Thus, to accurately estimate the extrinsic parameter of the depth sensor in the RGB-D camera, some methods have been proposed by using the extrinsic parameters of depth sensor relative to the color sensor. However, these methods complicate the calibration process. To simplify the calibration process of the extrinsic parameter of the depth sensor, the depth information should be fully utilized. Results of the methods are based on the color image with the parameter of the color sensor. However, the majority of applications on the RGB-D camera are based on the depth sensor. Moreover, parameters of the depth sensor should be directly calibrated. Method We build the spatial constraint relation between the ground plane and the camera, which can be used to select the ground plane from planes detected in the 3D point cloud. The ground plane should satisfy the following conditions: 1) The angle between the z axis of the camera and the ground plane is less than the specified threshold. 2) The z value of the ground plane in the world coordinate is larger than that of the other points, which are not in the ground plane. Moreover, we create the world coordinate based on the detected ground planes automatically. The origin point of the world coordinate is the projection of the origin point of the camera coordinate to the plane and the y axis of the world coordinate is the projection of the z axis of the camera coordinate to the plane. In addition, the direction of the z axis of the world coordinate is toward the origin point of the world coordinate from the origin point of the camera coordinate. We calibrate the extrinsic parameter of the RGB-D camera in the following steps. First, we reconstruct the 3D point cloud from the depth image retrieved from the depth sensor of the RGB-D camera. The reconstructed 3D point cloud is in the camera coordinate, whose subset forms a large number of planes. Second, planes in the 3D point cloud are detected by the MELSAC method. At most, one ground plane exists in the detected planes. Third, the spatial constraint rule between the ground plane and camera is built. The detected planes are filtered by the spatial constraint rule until the ground plane is found or all the planes are iterated. The process stops when a ground plane cannot be found. Finally, by using the relation between the ground plane and the camera, point sets are selected to calculate the extrinsic parameters. Result In the experiment, the benchmark is the result of checkerboard extrinsic calibration method only processing the RGB image of RGB-D information which is retrieved from the PrimeSense camera. We record an 89.4 s video and use it in the experiment. The videos contain two sub-videos: RGB video whose channel is 3 and depth video whose channel is 1. A 7x7 checkerboard is found in every frame of the RGB video, which is processed by the checkerboard-based method. The input of our proposed method is the frame of the depth video. The result shows that the average tilt angle error is -1.14°, the average pitch angle error is 4.57°, and the average camera height error is 3.96 cm. An experiment to test the robustness of the noise is also performed. The variance of the Gaussian noise in the depth frame is increased, and the result of each variance Gaussian noise is obtained. The stability of calibration decreases with the increase in the variance of Gaussian noise. The result shows that our method performs effectively when the variance of the Gaussian noise is below 0.01. Conclusion Our proposed method fully utilizes the depth information of the RGB-D camera, and simplifies the process of extrinsic calibration of the depth sensor. Thus, our method can be used in actual application. For convenience, the source code is also published. This method can automatically detect the ground plane and does not require other calibration objects. Therefore, the proposed method can calibrate each frame of the recorded video accurately, and it is not sensitive to the noise in the depth image. In addition, the algorithm has high parallelism. The process of estimating planes in the 3D point cloud and filtering these planes can be implemented in a parallel manner. The proposed method will have real-time performance based on this parallel implementation.