发布时间: 2020-06-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.190401
2020 | Volume 25 | Number 6

图像分析和识别

特征点聚类高精度视差图像拼接

谢从华¹, 张冰^1,2, 高蕴梅³

1. 常熟理工学院计算机科学与工程学院, 常熟 215500;

2. 苏州大学计算机科学与技术学院, 苏州 215006;

3. 常熟理工学院图书馆, 常熟 215500

收稿日期: 2019-08-14; 修回日期: 2019-11-01; 预印本日期: 2019-11-08

基金项目: 国家自然科学基金项目(61772242，61572239，61402204)；国家级大学生实践创新创业训练计划项目(201910333028)；江苏省大学生创新创业训练计划项目(201810333050X，201910333022Z)

第一作者简介: 谢从华, 1978年生, 男, 副教授, 主要研究方向为图像处理和模式识别。E-mail:conghuaxie@cslg.edu.cn;
张冰, 女, 硕士研究生, 主要研究方向为图像处理。E-mail:Zhangb@cslg.edu.cn;
高蕴梅, 女, 本科生, 主要研究方向为数字图书馆服务。E-mail:gaoyunmei@aliyun.com.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2020)06-1180-10

摘要

目的针对不同视点下具有视差的待拼接图像中，特征点筛选存在漏检率高和配准精度低的问题，提出了一种基于特征点平面相似性聚类的图像拼接算法。方法根据相同平面特征点符合同一变换的特点，计算特征点间的相似性度量，利用凝聚层次聚类把特征点划分为不同平面，筛选误匹配点。将图像划分为相等大小的网格，利用特征点与网格平面信息计算每个特征点的权重，通过带权重线性变换计算网格的局部单应变换矩阵。最后利用多频率融合方法融合配准图像。结果在20个不同场景图像数据上进行特征点筛选比较实验，随机抽样一致性(random sample consensus，RANSAC)算法的平均误筛选个数为30，平均误匹配个数为8，而本文方法的平均误筛选个数为3，平均误匹配个数为2。对20个不同场景的多视角图像，本文方法与AutoStitch(automatic stitching)、APAP(as projective as possible)和AANAP(adaptive as-natural-as-possible)等3种算法进行了图像拼接比较实验，本文算法相比性能第2的算法，峰值信噪比(peak signal to noise ratio，PSNR)平均提高了8.7%，结构相似性(structural similarity，SSIM)平均提高了9.6%。结论由本文提出的基于特征点平面相似性聚类的图像拼接算法处理后的图像保留了更多的特征点，因此提高了配准精度，能够取得更好的拼接效果。

关键词

图像拼接; 图像配准; 层次聚类; 视差图像; 局部单应变换; 特征点匹配

High-precision parallax image stitching method using feature points clustering

Xie Conghua¹, Zhang Bing^1,2, Gao Yunmei³

1. School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China;

2. School of Computer Science and Technology, Soochow University, Suzhou 215006, China;

3. Library, Changshu Institute of Technology, Changshu 215500, China

Supported by: National Natural Science Foundation of China(61772242, 61572239, 61402204)

Abstract

Objective Image stitching is a technology for solving the field-of-view (FOV) limitation of images by stitching multiple overlapping images to generate a wide-FOV image. Parallax image stitching remains a challenging problem. Although numerous methods and abundant commercial tools are beneficial in helping users organize and appreciate photo collections,many of these tools fail to provide convincing results when parallax images are given. Parallax image stitching methods with local homography transforms for partitioning cells are the most popular. However,many of these methods have a high rate of wrongly matching feature points and low accuracy in aligning feature points in different viewpoint images. We propose a novel stitching method using the hierarchical agglomerative clustering of feature points with their plane similarity information to improve the precision rate of matching feature points. Method First,we develop a feature point shift algorithm using the clustering results of the feature points with planar information. The scale-invariant transform feature points of all the images with different viewpoints are extracted. The k-nearest neighbors for each feature point are found using the k-d (lk-dimension) tree algorithm. $ K$ minimum sample sets are constructed,and each set includes four noncollinear feature points to compute the homography and residual matrices. Second,the planar information similarities of all the feature points are computed in accordance with the residual matrix,and all the feature points are divided into different clusters using the hierarchical agglomerative clustering algorithm. The feature points in each cluster have a common plane and the same homography transformation. If the mean of the residual matrix in one cluster is larger than a threshold,then the feature points in this cluster are labeled as wrongly matching feature points. Third,we propose an image stitching algorithm that partitions an image into cells with blend weights for multiplane images. All images are partitioned into equal-sized cells. The local homography transformation of each cell is computed via linear transformation with blend weights. The weight of each feature point is computed using the plane information of feature points. If one feature point and its cluster cent point have the same planar label,then the weight is equal to 1; otherwise,the weight is equal to their Gaussian kernel radial distance. Lastly,the aligned images are rendered as a panorama using a multiband blending method. Result We compare our feature point detection algorithm with the random sample consensus (RANSAC) algorithm on traditional building and pavilion images. The RANSAC algorithm found 427 and 541 matched feature points; our algorithm found 435 and 589 matched feature points. For the traditional building images,the RANSAC algorithm has six pairs of wrongly matched points; our algorithm has only one pair of wrongly matched points. For the pavilion images,the RANSAC algorithm has up to 20 pairs of wrongly matched points; our algorithm has only one pair of wrongly matched points. On 20 other different scene images,the average number of error feature points detected by the RANSAC algorithm is 30,and that of our method is only 3. The average number of wrongly matched point pairs of the RANSAC algorithm is eight,and that of our method is only two. We compare our image stitching method with three state-of-the-art methods (automatic stitching(AutoStitch),as projective as possible(APAP),and adaptive as-natural-as-possible(AANAP)) on traditional and modern building images. AutoStitch present an obvious seam line and ghosting in the results because of the global homograph. APAP and AANAP have better results with some ghosting. On the 20 different scene images,the peak signal-to-noise ratio and structural similarity index measure of our method are increased by 8.7% and 9.6%,respectively,making it the second best approach. Conclusion This study presents a novel method for high-precision parallax image stitching using feature point clustering. The experiment results show that our method,which constructs local homography transformation to shift feature points with the planar information clustering results,can increase the number of matched feature points,reduce the number of wrongly shifting and wrongly matching feature points,and increase the precision of feature point alignment compared with the state-of-the-art image stitching approaches of AutoStitch,APAP,and AANAP. The results also show that the proposed image stitching algorithm,which partitions images into cells and with blend weights for multiplane images,can achieve better image stitching performance with regard to pixel and image structure indexes than the state-of-the-art image stitching approaches of AutoStitch,APAP,and AANAP.

Key words

image stitching; image registration; hierarchical clustering; parallax image; local homography transformation; feature point match

0 引言

由于摄像设备视野的局限性，所获得的单幅图像信息量有限，通过传感器对同一场景不同视角拍摄，可以得到多幅具有重叠区域的图像。图像拼接利用待拼接图像间的重叠区域，将一系列窄视角图像合成为一幅高分辨率的宽视角图像。

目前提出了多种图像配准方法, 可以归纳为基于相位相关的图像拼接、基于灰度的图像拼接和基于特征点的图像拼接3类。其中基于特征点的图像拼接方法通过图像特征匹配点计算重叠区域的变换关系，适合大多数场景的图像拼接。

许多商业软件已实现了图像拼接，例如AutoStitch(automatic stitching)、ICE(image composite editor)和Photoshop。这些软件需要拍摄时相机光心近乎重合，使图像近似一个平面，否则就会因为物体间相对位置变化产生明显的拼接缝和伪影。日常生活中，相机拍摄时光心并不固定在同一个位置，存在移动、旋转等空间变化，捕获的图像往往存在视差。因此，针对具有视差的图像拼接研究成为了热点问题。

多视角图像拼接方法可以分为基于网格的图像拼接和基于最优缝合线的图像拼接两类。基于缝合线的方法能够很好地解决大视差问题，但采用全局单应矩阵筛选特征点，特征匹配对漏检率较高，没有显著性结构或者显著性区域特征点较少时会导致缝合线搜索失败。基于网格局部变换算法配准精度较高，是目前的主流方法之一。将图像重叠区域分为背景和前景两个平面，分别计算变换矩阵(Gao等，2011)，在一定程度上缓解了视差造成的拼接误差, 但是对于平面较为复杂的场景，该方法会产生伪影。为此，Zaragoza等人(2013)提出了APAP(as projective as possible)变换模型。该算法将待拼接图像划分为大小相同的密集网格，使用带权重线性变换算法计算每一个网格的局部变换矩阵，这种方法对于大部分图像都有较好的拼接效果，但对于图像中包含较少特征点的区域，仍有明显伪影。为此，提出了保持形状的半投影变换方法(Chang等，2014)，将图像划分为重叠区域、过渡区域以及非重叠区域，变换模型从投影变换逐渐过渡到相似变换，并利用全局相似变换约束，减小了投影失真。Lin等人(2015)提出了自适应的AANAP(adaptive as-natural-as-possible)拼接方法，线性化无重叠区域的投影变换，利用重叠区域的匹配点自动估计全局相似变换。在重叠区域采用投影变换和相似变换，在非重叠区域采用相似变换和仿射变换，缓解了拼接过程中的投影失真。国内外学者提出了多种APAP模型的改进算法，如引入全局相似变换处理全局失真(Chen和Chuang，2016；Xiang等，2018)，直线约束(Xiang等，2016；何川和周军，2018)。基于分块的图像拼接算法(张晶晶等，2018)将图像中不同物体分割为不同区域，对每个区域分别利用随机抽样一致性(random sample consensus，RANSAC)(Raguram等，2013)算法筛选特征点。该算法虽然计算了多个不同的变换矩阵，但对于图像中特征点较少的区域，没有足够的特征点计算单应变换矩阵。基于显性子平面的图像拼接方法(薛佳乐等，2018)利用特征点间的距离作为层次聚类的簇间距离，将图像划分为多个平面计算单应矩阵，但距离较近的特征点可能不属于同一平面，导致拼接图像存在伪影。

综上所述，虽然基于网格的多视角图像拼接方法具有良好的拼接效果，但存在配准精度问题：在图像拼接过程中，基于网格的局部变换算法的配准精度依赖于特征点的均匀度和匹配的准确度，该模型采用RANSAC算法(Raguram等，2013)，利用全局单应变换筛选特征点, 剔除了较多不属于该单应变换的正确匹配点，影响了配准精度。

不同视角下的待拼接图像，重叠区域间物体的相对位置有所变化，因此物体不同平面应对应不同的变换矩阵。根据特征点的平面信息聚类划分，同一个簇(或类)中的特征点之间具有较高的平面相似性，而不同簇中的特征点具有不同的平面相似性。为此，本文针对具有视差的多平面图像，根据特征点的平面信息进行聚类划分，同一聚类的特征点使用相同的局部变换矩阵，不同聚类划分之间使用不同的局部变换矩阵，本质上起到了特征点数据降维的作用，可以提高图像配准和拼接的准确率。本文的贡献和创新点在于：1)提出了特征点的平面信息层次聚类的多平面特征点筛选算法；2)实现基于平面信息分配权重的线性变换计算局部单应矩阵的图像拼接，以降低特征点筛选的漏检率，提高图像配准的精度。

1 研究基础

对两幅待拼接图像$\mathit{\boldsymbol{A}} $和$ \mathit{\boldsymbol{B}}$分别利用尺度不变特征变换(scale-invariant feature transform，SIFT)(Lowe，1999)算法提取特征点并通过k-d(k-dimension)树进行粗匹配(余淮和杨文，2016)，匹配后的特征点集为$\boldsymbol{P}=\left\{\boldsymbol{P}_{1}, \boldsymbol{P}_{2}, \cdots, \boldsymbol{P}_{N}\right\}(N $为两幅图像特征点匹配的总对数)，其中特征点对$ \left. {{\mathit{\boldsymbol{P}}_i} = \left\{ {\mathit{\boldsymbol{z}}_i^A} \right., \mathit{\boldsymbol{z}}_i^B} \right\}(i = 1, 2, \cdots, N), \mathit{\boldsymbol{z}}_i^A = \left({x_i^A, y_i^A} \right){\rm{ }}$和$ \mathit{\boldsymbol{z}}_i^B = \left({x_i^B, y_i^B} \right)$表示第$ i$组匹配对在图像$\mathit{\boldsymbol{A}} $和$ \mathit{\boldsymbol{B}}$中对应的特征点坐标。

计算$ \mathit{\boldsymbol{z}}_i^B$映射到$ \mathit{\boldsymbol{z}}_i^A$的变换矩阵为

$ \mathit{\boldsymbol{\tilde z}}_i^A = \mathit{\boldsymbol{H\tilde z}}_i^B $

(1)

式中，$\mathit{\boldsymbol{\tilde z}}_i^A $和$\mathit{\boldsymbol{\tilde z}}_i^B $分别是$ \mathit{\boldsymbol{z}}_i^A$和$ \mathit{\boldsymbol{z}}_i^B$的齐次坐标，$ \boldsymbol{H} \in \mathbf{R}^{3 \times 3}$是单应变换矩阵。在非齐次坐标系下为

$ \left\{ \begin{array}{l} x_i^A = \frac{{{{[{\mathit{\boldsymbol{h}}^1}]}^{\rm{T}}}{{\left[ {\begin{array}{*{20}{l}} {x_i^B}&{y_i^B}&1 \end{array}} \right]}^{\rm{T}}}}}{{{{[{\mathit{\boldsymbol{h}}^2}]}^{\rm{T}}}{{\left[ {\begin{array}{*{20}{l}} {x_i^B}&{y_i^B}&1 \end{array}} \right]}^{\rm{T}}}}}\\ y_i^A = \frac{{{{[{\mathit{\boldsymbol{h}}^1}]}^{\rm{T}}}{{\left[ {\begin{array}{*{20}{l}} {x_i^B}&{y_i^B}&1 \end{array}} \right]}^{\rm{T}}}}}{{{{[{\mathit{\boldsymbol{h}}^3}]}^{\rm{T}}}{{\left[ {\begin{array}{*{20}{l}} {x_i^B}&{y_i^B}&1 \end{array}} \right]}^{\rm{T}}}}} \end{array} \right. $

(2)

式中，${{{\left[ {{\mathit{\boldsymbol{h}}^j}} \right]}^{\rm{T}}}} $是$\mathit{\boldsymbol{H}} $的第$ j$行($ j$=1, 2, 3)。采用交叉乘积，可以将式(1)重写为$ \mathit{\boldsymbol{\tilde z}}_i^A \times \mathit{\boldsymbol{H\tilde z}}_i^B = {{\mathbf{0}}_{3 \times 1}}$，即

$ \left[ {\begin{array}{*{20}{c}} {{{\mathbf{0}}_{1 \times 3}}}&{ - {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {{[\mathit{\boldsymbol{\tilde z}}_i^B]}^{\rm{T}}}}&{y_i^A{{[\mathit{\boldsymbol{\tilde z}}_i^B]}^{\rm{T}}}}\\ {{{[\mathit{\boldsymbol{\tilde z}}_i^B]}^T}}&{{{\mathbf{0}}_{1 \times 3}}}&{ - {\kern 1pt} {\kern 1pt} {\kern 1pt} x_i^A{{[\mathit{\boldsymbol{\tilde z}}_i^B]}^{\rm{T}}}}\\ { - {\kern 1pt} {\kern 1pt} {\kern 1pt} y_i^A{{[\mathit{\boldsymbol{\tilde z}}_i^B]}^{\rm{T}}}}&{x_i^A{{[\mathit{\boldsymbol{\tilde z}}_i^B]}^{\rm{T}}}}&{{{\mathbf{0}}_{1 \times 3}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{\mathit{\boldsymbol{h}}^1}}\\ {{\mathit{\boldsymbol{h}}^2}}\\ {{\mathit{\boldsymbol{h}}^3}} \end{array}} \right] = {{\mathbf{0}}_{3 \times 1}} $

(3)

式(3)只有两行是线性无关的，因此，令$ {\mathit{\boldsymbol{a}}_i} \in {{\mathbf{R}}^{2 \times 9}}$为匹配点$ {P_i} = \left\{ {\mathit{\boldsymbol{z}}_i^A, \mathit{\boldsymbol{z}}_i^B} \right\}$对应的方程组系数，即

$ {\mathit{\boldsymbol{a}}_i} = \left[ {\begin{array}{*{20}{c}} {{{\mathbf{0}}_{1 \times 3}}}&{ - \mathit{\boldsymbol{z}}_i^{B{\kern 1pt} {\rm{T}}}}&{y_i^A\mathit{\boldsymbol{z}}_i^{B{\kern 1pt} {\rm{T}}}}\\ {\mathit{\boldsymbol{z}}_i^{B{\kern 1pt} {\rm{T}}}}&{{{\mathbf{0}}_{1 \times 3}}}&{ - x_i^A\mathit{\boldsymbol{z}}_i^{B{\kern 1pt} {\rm{T}}}} \end{array}} \right] $

则$\boldsymbol{h} \in \mathbf{R}^{9 \times 1} $的计算公式为

$ \mathit{\boldsymbol{Ch}} = {\mathbf{0}} $

(4)

式中，$ \mathit{\boldsymbol{C}} \in {{\mathbf{R}}^{2N \times 9}}$是所有特征点的系数。单应性矩阵带有8个未知参数，当有4对匹配点时，式(4)有唯一解；当匹配对多于4组时，式(4)为超定方程组，不存在精确解，因此采用直接线性变换算法估算$\mathit{\boldsymbol{h}} $，即

$ \mathit{\boldsymbol{h}} = {\rm{arg}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {{\rm{min}}}\limits_\mathit{\boldsymbol{h}} \sum\limits_{i = 1}^N {{{\left\| {{\mathit{\boldsymbol{a}}_i}\mathit{\boldsymbol{h}}} \right\|}^2}} = {\rm{arg}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {{\rm{min}}}\limits_\mathit{\boldsymbol{h}} {\left\| {\mathit{\boldsymbol{Ch}}} \right\|^2} $

(5)

式(5)满足$\parallel \mathit{\boldsymbol{h}}\parallel = 1 $，该解是$\mathit{\boldsymbol{C}} $的最小右奇异值向量。

2 基于特征点平面相似性聚类的图像拼接

通过自底向上的聚类将特征点划分为不同平面，筛选误匹配点。再将图像划分为大小相等的网格，通过带权重的线性变换算法计算网格的局部单应变换。最后利用多频率融合方法融合已配准的图像。

2.1 基于平面相似性聚类的特征点筛选

与RANSAC算法采用全局单应变换筛选匹配点不同，本文利用局部单应变换计算匹配点之间的平面相似性度量值，采用凝聚层次聚类算法将匹配点聚类，从而划分为不同的平面类别，筛选误匹配点。通过特征点平面信息聚类构造局部单应变换矩阵筛选特征点，可以增加匹配的特征点个数, 减少误筛选个数和误匹配个数，增加特征点配准的精度。

进行特征点匹配对采样，构建$ K$个最小采样集。假设每个最小采样集的单应性矩阵都对应一个平面变换，而距离较近的特征点位于同一个平面的概率较大。因此，本文利用特征点间距离度量进行采样。采样规则如下：从特征点集$ \left\{ {\mathit{\boldsymbol{z}}_1^A, \mathit{\boldsymbol{z}}_2^A, \cdots, \mathit{\boldsymbol{z}}_N^A} \right\}$中随机选择1个点，在该点的邻域范围内随机选择其他3个点，若4个点不共线，则构成一个最小采样集。

根据式(4)分别计算$ K$个最小采样集的单应性矩阵，得到$\boldsymbol{H}^{*} \in \mathbf{R}^{9 \times K}=\left[\boldsymbol{h}_{1}, \boldsymbol{h}_{2}, \cdots, \boldsymbol{h}_{K}\right] $。

计算$d(i, j) $, 构建残差矩阵$ \boldsymbol{d} \in \mathbf{R}^{N \times K}$，即

$ d(i,j) = |{\mathit{\boldsymbol{h}}_j} \times \mathit{\boldsymbol{\tilde z}}_{\rm{i}}^A - \mathit{\boldsymbol{\tilde z}}_{\rm{i}}^B| + |\mathit{\boldsymbol{h}}_j^{( - 1)} \times \mathit{\boldsymbol{\tilde z}}_{\rm{i}}^B - \mathit{\boldsymbol{\tilde z}}_{\rm{i}}^A| $

(6)

式中，$d(i, j) $值越小，说明匹配对$\left\{z_{\mathrm{i}}^{A}, z_{\mathrm{i}}^{B}\right\} $越接近变换矩阵$ {\mathit{\boldsymbol{h}}_j}$所对应的平面。将残差矩阵的每一个行向量$ \boldsymbol{d}(i, :)(i=1, 2, \cdots, N)$升序排列得到$ \mathit{\boldsymbol{d'}} \in {{\mathbf{R}}^{N \times K}}$，保留${\mathit{\boldsymbol{d'}}} $前20%的分量构建矩阵$\mathit{\boldsymbol{d^{*}}} \in {{\mathbf{R}}^{N \times t}}(t = 0.2 \times K) $

$ {\mathit{\boldsymbol{d}}^*}(i,:) = {\mathit{\boldsymbol{d}}^\prime }(i,1:t)(i = 1,2, \cdots ,N) $

(7)

构建矩阵$ \boldsymbol{L} \in \mathbf{R}^{N \times t}, L(i, j)$是$d^{*} (i, j)$所对应的单应性矩阵的序号。

一般地，相同平面特征点匹配对符合相同的单应变换，不同平面特征点匹配对不会符合同一单应变换。两个不同匹配点对对应的单应性变换相同的越多，属于同一平面的概率越大，因而构建矩阵$ \mathit{\boldsymbol{F}} \in {{\mathbf{R}}^{N \times N}}$，$ F(i, j)$是$\boldsymbol{L}(i, :) $和$\boldsymbol{L}(j, :) $交集的个数。

计算特征点匹配对之间的平面相似性度量矩阵$\mathit{\boldsymbol{\gamma }} \in {{\mathbf{R}}^{N \times N}} $，即

$ \gamma (i,j) = \sum\limits_{k = 1}^N {{\kern 1pt} {\kern 1pt} {\kern 1pt} {{(F(i,k) - F(j,k))}^2}} $

(8)

根据两个特征点间的相似性度量矩阵$ \mathit{\boldsymbol{\gamma }}$，实现凝聚层次聚类算法(de Morsier等，2015)。将每一个特征点都作为一个独立的类簇，计算簇中所有点与另一个簇中所有点的平均距离，找到均值较小的两个簇合并，直到所有特征点合并完成，建立层次聚类树。根据层次聚类树可以将样本集划分为任意数量的类别。图 1为古建筑中部分特征点的层次聚类结果(仅显示30对匹配点，划分为5类)。

图 1 古建筑图像特征点的层次聚类树

Fig. 1 Traditional building image feature points hierarchical clustering tree

误匹配特征点匹配对不符合所有的单应变换，其聚集类中所有特征点对单应变换的残差都很大。计算每个类别中所有匹配点与其对应单应性变换的残差均值${\eta _m} $为

$ {\eta _m} = \sum\limits_{i{\kern 1pt} {\kern 1pt} \in {\kern 1pt} {\kern 1pt} \mathit{\boldsymbol{C}}_m^\prime } {\sum\limits_{k = 1}^t {{\mathit{\boldsymbol{d}}^*}} } (i,k)/M $

(9)

式中，$ \boldsymbol{C}_{m}^{\prime}$表示第$ m$个聚集类，$ M$是$ \boldsymbol{C}_{m}^{\prime}$类中匹配对的总个数。

根据均值${\eta _m} $筛选出误匹配点集。若${\eta _m} $的值大于阈值，则类中所有特征点为错误匹配对，否则，将该类包含的特征点加入到正确匹配点集中。

2.2 多平面图像拼接

基于网格的局部变换算法模型，将待拼接图像划分为网格并用带权重的线性变换算法计算每个网格的单应矩阵。模型中计算特征点与网格中心点的欧氏距离作为特征点对当前局部单应的权重。考虑到每个网格与其属于同一平面上的特征点具有相同的变换矩阵，将结合层次聚类过程中得到的平面信息重新定义权重，提高配准精度。

将待拼接图像$\mathit{\boldsymbol{A}} $和$ \mathit{\boldsymbol{B}}$划分为相同大小的单元网格，采用带权重的线性变换算法计算每个网格的单应性矩阵，在式(5)的基础上引入权重因子${\omega ^i} $，求解网格的变换参数$ \mathit{\boldsymbol{h}}$

$ \mathit{\boldsymbol{h}} = {\rm{arg}}{\kern 1pt} {\kern 1pt} {\kern 1pt} \mathop {{\rm{min}}}\limits_\mathit{\boldsymbol{h}} \sum\limits_{i = 1}^N {{{\left\| {{\omega ^i}{a_i}\mathit{\boldsymbol{h}}} \right\|}^2}} $

(10)

式中，$ \mathit{\boldsymbol{h}}$是单应性矩阵；$ {a_i}$是第$ i$组特征点匹配对对应的方程组系数；${\omega ^i} $是第$ i$对匹配点对当前局部单应矩阵的权重。权重因子${\omega ^i} $计算为

$ {\omega ^i} = \left\{ {\begin{array}{*{20}{l}} {1{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} l({\mathit{\boldsymbol{P}}_i}) = l({\mathit{\boldsymbol{P}}_*})}&{}\\ {{\rm{max}}(\exp ( - {{\left\| {{\mathit{\boldsymbol{P}}_i} - {\mathit{\boldsymbol{P}}_*}} \right\|}^2}/{\sigma ^2}),\mathit{\boldsymbol{\rho }})}&{{\rm{ 其他 }}} \end{array}} \right. $

(11)

式中，$ {\mathit{\boldsymbol{P}}_*}$是网格中心点的坐标, $\sigma $是基于高斯核径向距离的尺度因子, $\rho \in \left[ {0, 1} \right] $是为了避免特征点权重过于接近0而进行的规格化操作, $l({\mathit{\boldsymbol{P}}_i}) $和$l({\mathit{\boldsymbol{P}}_*})$是特征点和网格中心点所属的平面标签。

$ {\mathit{\boldsymbol{P}}_*}$所属平面由该点邻域的特征点平面决定，在其邻域范围内统计属于每个平面的特征点个数，取特征点个数最大的平面为该点所属平面。

计算每个网格的单应矩阵，将图像$ \mathit{\boldsymbol{B}}$映射到图像$\mathit{\boldsymbol{A}} $的坐标系下，利用多频带融合方法(Brown和Lowe，2007)进行融合。

综上所述，本文提出的解决大视差的图像拼接算法流程为：

输入：待拼接图像$\mathit{\boldsymbol{A}} $和$ \mathit{\boldsymbol{B}}$, 采样集个数$ K$。

输出：拼接图像$ \mathit{\boldsymbol{G}}$。

1) 提取图像$\mathit{\boldsymbol{A}} $和$ \mathit{\boldsymbol{B}}$的SIFT特征点。

2) 通过k-d树计算特征点的粗匹配集$\left\{ {\left({\mathit{\boldsymbol{z}}_1^A} \right.} \right., \left. {\left. {\mathit{\boldsymbol{z}}_1^B} \right), \left({\mathit{\boldsymbol{z}}_2^A, \mathit{\boldsymbol{z}}_2^B} \right), \cdots, \left({\mathit{\boldsymbol{z}}_N^A, \mathit{\boldsymbol{z}}_N^B} \right)} \right\} $。

3) 图像$\mathit{\boldsymbol{A}} $的特征点$\mathit{\boldsymbol{z}}_i^A(i = 1, 2, \cdots, N) $按照采样规则构造 $ K$个包含4对匹配特征点对的最小采样集。

4) 根据式(5), 计算每个最小采样集对应的单应性矩阵$\boldsymbol{h}_{i}(1 \leqslant i \leqslant K) $。

5) 根据式(6), 计算残差矩阵$\boldsymbol{d} \in \mathbf{R}^{N \times K} $。

6) 根据式(7), 构建矩阵${\mathit{\boldsymbol{d^{*}}}} $。

7) 记$ \boldsymbol{L}(i, j)=\boldsymbol{d}^{*}(i, j)$对应单应性矩阵${\mathit{\boldsymbol{h}}_j} $的序号$ j$, 构建矩阵$ \boldsymbol{L} \in \mathbf{R}^{N \times t}$。

8) 记$ \boldsymbol{F}(i, j)=\boldsymbol{L}(i, :)$和$ \boldsymbol{L}(j, :)$具有相同序号的个数, 构建矩阵$ \mathit{\boldsymbol{F}} \in {{\mathbf{R}}^{N \times N}}$。

9) 根据式(8), 计算不同特征匹配对间的平面相似性度量矩阵$\mathit{\boldsymbol{\gamma }} \in {{\mathbf{R}}^{N \times N}} $。

10) 将矩阵$\mathit{\boldsymbol{\gamma }} \in {{\mathbf{R}}^{N \times N}} $作为凝聚层次聚类距离，根据平均距离合并每个簇，聚类划分特征点匹配对。

11) 根据式(9), 计算每个类别中所有特征匹配对与其对应单应性变换的残差均值${\eta _m} $，并将残差均值小于阈值的类别中所有的特征匹配对加入特征点匹配集$ {\mathit{\boldsymbol{\phi}} _*}$。

12) 将图像划分为大小100×100的网格，根据式(10)和集合$ {\mathit{\boldsymbol{\phi}} _*}$计算每个网格的单应性矩阵。

13) 根据单应性矩阵，图像$ \mathit{\boldsymbol{B}}$变换到图像$\mathit{\boldsymbol{A}} $的坐标系下, 并用多频带融合法融合得到拼接图像$ \mathit{\boldsymbol{G}}$。

3 实验结果及分析

实验环境为：Intel(R) Core(TM) i7-2600 CPU @3.40 GHz，4.00 GB内存，Windows7操作系统，MATLAB语言。

3.1 特征点筛选实验

为了验证本文特征点筛选算法的有效性，将其与应用广泛的随机抽样一致性算法RANSAC (Raguram等，2013)对比。本文算法的采样集个数与RANSAC算法同为$K=4N $，层次聚类的类别数为8，残差均值阈值为0.02。

特征点匹配对筛选结果如图 2和图 3所示，其中绿色为匹配的特征点对，红色为筛选的特征点对。图 2是重叠区域具有较少平面的简单场景，从图中可以看出，RANSAC将树木部分的正确匹配点(红色矩形框)误认为是外点，这是由于建筑和地面具有大量的SIFT特征点，树木中提取的特征点较少，该算法在筛选匹配点时，计算的单应变换是全局的，而树木、建筑和地面属于不同的平面，对应不同的单应变换，因此该算法忽略了树木所在平面。图 3是重叠区域具有多个平面的复杂场景。从图中可以看出，RANSAC算法将凉亭顶部的外檐部分、亭子的台面以及亭后的树中的匹配点(红色矩形框)误筛选，由于平面的复杂性，该算法选择包含特征点最多的单应变换，筛选出不属于该变换的点，因此会忽略很多较小平面上的特征点。而本文算法对于图 2和图 3都保留了较多的正确匹配点，本文根据相同平面特征点对应同一单应变换，计算特征点间的相似性度量并将其分为不同的平面，不属于任何平面的点被视为是误匹配点，因此具有较好的效果。

图 2 古建筑的特征点匹配对筛选

Fig. 2 Traditional image match point detecting results by different algorithms ((a) RANSAC; (b) ours)

图 3 凉亭的特征点匹配对筛选结果

Fig. 3 Pavilion image match point detecting results by different algorithms ((a) RANSAC; (b) ours)

表 1是RANSAC算法和本文算法对2幅不同场景的待拼接图像对比，统计了匹配点误筛选和误匹配对的个数。由于凉亭组中天空和云彩的特征点相似，两种算法都存在误匹配对，而本文算法误筛选的匹配对个数较少。

表 1 各算法在2个场景下图像匹配对筛选结果
Table 1 Results of image match points detecting algorithms on 2 scenes

下载CSV

图像	粗匹配个数	算法	匹配个数	误筛选个数	误匹配个数
古建筑	482	RANSAC	427	8	6
古建筑	482	本文	435	2	1
凉亭	817	RANSAC	541	43	20
凉亭	817	本文	589	1	1
注：加粗字体表示最优结果。

表 2是在20个不同场景下，RANSAC算法和本文算法图像匹配对筛选的性能对比，RANSAC算法的平均误筛选个数为30，平均误匹配个数为8，而本文方法的平均误筛选个数为3，平均误匹配个数为2。可见，本文算法能够保留更多的正确特征点匹配对。

表 2 各算法在20个场景下图像匹配对筛选平均性能
Table 2 Average performance of image match point detecting algorithms on 20 different scenes

下载CSV

算法	平均误筛选个数	平均误匹配个数
RANSAC	30	8
本文	3	2
注：加粗字体表示最优结果。

3.2 图像拼接实验

为了验证本文图像拼接算法的有效性，本文算法与AutoStitch算法(Liu等，2009)，APAP算法(Zaragoza等，2013)和AANAP算法(Lin等，2015)对比。本文实验的图像数据选取了20幅不同场景的图像。本文算法设置参数与APAP算法、AANAP算法参数相同，参数$ \sigma=12, \gamma=0.0015$。

图 4是古建筑图像的拼接结果，从结果图来看，AutoStitch算法的结果在地面的弧线、房顶和树等多个区域都有明显的拼接缝和重影，由于该算法采用全局单应变换，所有像素点都通过同一变换映射，因此会造成大部分重影。APAP算法和AANAP算法拼接效果较好，但通过细节图可以看出，对于特征点少的树木和路灯，都有明显重影，由于两种算法在划分网格计算局部单应变换时，采用特征点的距离计算权重，树木和路灯周围的特征点较少，其他特征点匹配对与这两个区域的距离较大，因此存在伪影。本文算法的拼接结果，不仅整体对齐度较高，而且在特征点较少的区域也具有很好的拼接效果。本文利用了网格的平面信息，由于同一平面对应相同的变换，因此对于树木和路灯，可以根据与其同属一个平面的特征点计算该区域的变换矩阵，获得较好的拼接效果。

图 4 古建筑拼接结果及局部细节

Fig. 4 Stitching results and location amplifying detail of the selected regions by different algorithms for the traditional building image ((a) viewpoint 1;(b) viewpoint 2; (c) AutoStitch; (d) APAP; (e) AANAP; (f) ours)

图 5是楼宇图像的拼接结果和选择区域的局部放大细节，从结果图来看，AutoStitch算法的拼接结果在楼梯和楼的边缘等多个区域具有较为严重的重影现象。APAP算法和AANAP算法在房子顶部边缘的拼接相对AutoStitch算法较好，但是在十字、楼梯和房子侧边缘由于周围特征点较少，存在伪影。本文算法相对其他3种算法，在上述区域均未出现重影和拼接缝，拼接效果较好。

图 5 楼宇拼接结果及局部细节

Fig. 5 Stitching results and location amplifying detail of the selected regions by different algorithm for modern building image ((a) viewpoint 1; (b) viewpoint 2; (c) AutoStitch; (d) APAP; (e) AANAP; (f) ours)

为了定量评价相关算法的性能，本文选取了峰值信噪比(peak signal to noise ratio，PSNR)和结构相似性(structural similarity，SSIM)(Sheikh等，2006；Wang等，2004)指标衡量算法的拼接性能。

PSNR是利用像素误差衡量图像失真的标准，两幅图像之间PSNR值越大则越相似，即

$ {P_1} = 10 \times {\rm{lg}}\left( {\frac{{{{({2^8} - 1)}^2}}}{{{M_1}}}} \right) $

(12)

式中，${M_1} = \frac{1}{{{W_1}{W_2}}}\sum\limits_{i = 0}^{{W_1} - 1} {\sum\limits_{j = 0}^{{W_2} - 1} {{{\left({{I_A}(i, j) - {I_B}(i, j)} \right)}^2}} } $为均方误差，$ {\mathit{\boldsymbol{I}}_A}$为图像$\mathit{\boldsymbol{A}} $的重叠区域，$ {\mathit{\boldsymbol{I}}_B}$为图像$ \mathit{\boldsymbol{B}}$单应性变换后的重叠区域。$ {W_1}, {W_2}$分别是重叠区域的长和宽。

SSIM是利用图像间的亮度、对比度以及结构3个因素衡量图像的相似度，SSIM值越大，则图像越相似，即

$ S({\mathit{\boldsymbol{I}}_A},{\mathit{\boldsymbol{I}}_B}) = \frac{{(2{\mu _{{I_A}}},{\mu _{{I_B}}} + {c_1})(2{\sigma _{{I_A}{I_B}}} + {c_2})}}{{(\mu _{{I_A}}^2 + \mu _{{I_B}}^2 + {c_1})(\sigma _{{I_A}}^2 + \sigma _{{I_B}}^2 + {c_2})}} $

(13)

式中，$ \mu_{I_{A}}$ 和$ \mu_{I_{B}}$ 分别是$ {\mathit{\boldsymbol{I}}_A}$和$ {\mathit{\boldsymbol{I}}_B}$的灰度平均值，$ \sigma_{l_{A}}^{2}$和$ \sigma_{l_{B}}^{2}$ 分别是$ {\mathit{\boldsymbol{I}}_A}$和$ {\mathit{\boldsymbol{I}}_B}$的灰度方差，$ \sigma_{I_{A} I_{B}}$ 是$ {\mathit{\boldsymbol{I}}_A}$和$ {\mathit{\boldsymbol{I}}_B}$的协方差，${c_1} $和$ {c_2}$是维持稳定的常数。

表 3为AutoStitch、APAP、AANAP和本文方法针对不同场景图像在峰值信噪比和结构相似性指标上的对比结果，本文方法的PSNR和SSIM都为最优。

表 3 各算法在2个场景下图像拼接结果
Table 3 Results of image stitching algorithms on 2 scenes

下载CSV

图像	算法	PSNR/dB	SSIM
古建筑	AutoStitch	25.579 6	0.787 2
	APAP	26.272 3	0.813 0
	AANAP	26.399 8	0.832 1
	本文	29.177 3	0.942 7
楼宇	AutoStitch	20.583 5	0.762 0
	APAP	21.264 0	0.809 8
	AANAP	21.844 9	0.839 6
	本文	24.273 6	0.900 5
注：加粗字体表示最优结果。

由表 4可知，在20个不同场景下图像拼接，本文方法相比性能第2的算法, PSNR平均提高了8.7%, SSIM平均提高了9.6%。

表 4 各算法在20个场景下图像拼接的平均性能
Table 4 Average performance of different image stitching algorithms on 20 different scenes

下载CSV

算法	平均PSNR/dB	平均SSIM
AutoStitch	22.435 6	0.784 2
APAP	23.786 1	0.816 3
AANAP	23.881 2	0.840 9
本文	25.962 2	0.921 5
注：加粗字体表示最优结果。

综上可见，本文方法的拼接图像无论在单个像素的相似度还是在结构相似度方面都比其他3种算法好，说明本文方法能够很好地对齐重叠区域，提高配准精度。

4 结论

提出了一种基于层次聚类的多平面图像拼接方法，根据同一平面对应相同变换计算特征间的相似性度量，利用层次聚类筛选图像的特征匹配点，结合平面信息计算局部单应变换。实验结果表明，本文方法可以充分保留图像的正确匹配点，与AutoStitch、APAP、AANAP等图像拼接方法相比，减少了图像的拼接缝和重影，提高了配准精度，具有较好的拼接效果。

由于本文主要考虑的是静态图像拼接，对于运动的相机和场景图像或视频拼接存在不连贯的问题，今后将研究相机和场景拼接中的运动补偿机制, 设计更加高级的几何和光照模型，以适应更加复杂的使用条件。

参考文献

Brown M, Lowe D G. 2007. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1): 59-73 [DOI:10.1007/s11263-006-0002-3]

Chang C H, Sato Y and Chuang Y Y. 2014. Shape-preserving half-projective warps for image stitching//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE: 3254-3261[DOI:10.1109/CVPR.2014.422]

Chen Y S and Chuang Y Y. 2016. Natural image stitching with the global similarity prior//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 186-201[DOI:10.1007/978-3-319-46454-1_12]

de Morsier F, Tuia D, Borgeaud M, Gass V, Thiran J P. 2015. Cluster validity measure and merging system for hierarchical clustering considering outliers. Pattern Recognition, 48(4): 1478-1489 [DOI:10.1016/j.patcog.2014.10.003]

Gao J H, Kim S J and Brown M S. 2011. Constructing image panoramas using dual-homography warping//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE: 49-56[DOI:10.1109/CVPR.2011.5995433]

He C, Zhou J. 2018. Mesh-based image stitching algorithm with linear structure protection. Journal of Image and Graphics, 23(7): 973-983 (何川, 周军. 2018. 具有直线结构保护的网格化图像拼接. 中国图象图形学报, 23(7): 973-983) [DOI:10.11834/jig.170653]

Lin C C, Pankanti S U, Ramamurthy K N and Aravkin A Y. 2015. Adaptive as-natural-as-possible image stitching//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 1155-1163[DOI:10.1109/CVPR.2015.7298719]

Liu F, Gleicher M, Jin H L and Agarwala A. 2009. Content-preserving warps for 3D video stabilization. ACM Transactions on Graphics: 28(3): #44[DOI:10.1145/1531326.1531350]

Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra: IEEE: 1150-1157[DOI:10.1109/ICCV.1999.790410]

Raguram R, Chum O, Pollefeys M, Matas J, Frahm J M. 2013. USAC:a universal framework for random sample consensus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 2022-2038 [DOI:10.1109/tpami.2012.257]

Sheikh H R, Sabir M F, Bovik A C. 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing, 15(11): 3440-3451 [DOI:10.1109/tip.2006.881959]

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. 2004. Image quality assessment:from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI:10.1109/TIP.2003.819861]

Xiang T Z, Xia G S, Bai X, Zhang L P. 2018. Image stitching by line-guided local warping with global similarity constraint. Pattern Recognition, 83: 481-497 [DOI:10.1016/j.patcog.2018.06.013]

Xiang T Z, Xia G S, Zhang L P and Huang N N. 2016. Locally warping-based image stitching by imposing line constraints//Proceedings of the 23rd International Conference on Pattern Recognition. Cancun: IEEE: 4178-4183[DOI:10.1109/ICPR.2016.7900289]

Xue J L, Zhao M, Zhang Z, Cheng X, Chen S Y. 2018. Dominant sub-plane registration algorithm for large parallax image stitching. Journal of Image and Graphics, 23(3): 323-332 (薛佳乐, 赵萌, 张哲, 程徐, 陈胜勇. 2018. 针对大视差图像拼接的显性子平面配准. 中国图象图形学报, 23(3): 323-332) [DOI:10.11834/jig.170518]

Yu H, Yang W. 2016. A fast feature extraction and matching algorithm for unmanned aerial vehicle images. Journal of Electronics & Information Technology, 38(3): 509-516 (余淮, 杨文. 2016. 一种无人机航拍影像快速特征提取与匹配算法. 电子与信息学报, 38(3): 509-516) [DOI:10.11999/JEIT150676]

Zaragoza J, Chin T J, Brown M S and Suter D. 2013. As-projective-as-possible image stitching with moving DLT//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE: 2339-2346[DOI:10.1109/CVPR.2013.303]

Zhang J J, Zhai D H, Huang L Z, Yu Q. 2018. Parallax image stitching algorithm based on feature blocking. Computer Engineering, 44(5): 220-226 (张晶晶, 翟东海, 黄莉芝, 喻强. 2018. 基于特征分块的视差图像拼接算法. 计算机工程, 44(5): 220-226) [DOI:10.19678/j.issn.1000-3428.0046334]