基于互校验加权光流的视频三维重建关键帧提取方法
Key frame extraction method for video 3D reconstruction based on mutual check weighted optical flow
- 2025年 页码:1-18
收稿日期:2025-01-06,
修回日期:2025-03-06,
录用日期:2025-03-10,
网络出版日期:2025-03-12
DOI: 10.11834/jig.250009
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2025-01-06,
修回日期:2025-03-06,
录用日期:2025-03-10,
网络出版日期:2025-03-12,
移动端阅览
目的
2
视觉匹配导航需要预先构建场景三维点云信息,相较于传统软件和专业仪器测图建模,基于消费级终端的视频流数据视觉建模具有成本低、数据更新方便和空间覆盖广等优势,但视频帧因数量庞大存在图像冗余,而造成三维模型重建计算代价高、累计误差较大甚至重建失败的问题,因此本文提出一种基于互校验加权光流的三维重建关键帧提取方法。
方法
2
首先,利用传感器陀螺仪数据对视频流中的图像进行场景预分类;然后,采用SIFT(Scale Invariant Feature Transform)算法检测图像特征点和描述符,并结合FLANN(Fast Library for Approximate Nearest Neighbors)匹配和金字塔LK(Lucas-Kanade)光流算法,捕捉相邻帧的动态变化,提取两种算法同时检测成功的特征点并计算欧氏距离,筛选出相邻帧强匹配点对;最后,基于场景预分类结果,对图像消失点附近的强匹配点对,在直线道路采取高斯加权,在转弯道路采取均匀加权,计算帧间光流场总运动从而获取相似度,最终实现视频关键帧提取。
结果
2
实验利用消费级终端自采集4组不同场景数据,将本文算法与传统关键帧提取算法进行对比,统计提取关键帧数量并利用结构相似性指数计算高相似度帧数量,将直线和转弯道路提取结果与原视频帧分别进行对比,最后进行三维模型重建实验从而评估提取效果。实验结果表明,本文算法可以将视频帧总数量降低到10%左右,其中高相似度帧数量明显少于其他算法;相较于直线道路,在转弯处关键帧数量占比较大,符合三维重建预期需求;最终模型重建完整度在4组数据上分别为100%、100%、97.46%和96.54%,优于其他算法。
结论
2
本文提出基于互校验加权光流的三维重建关键帧提取方法能有效降低视频帧数量,筛选的关键帧能够提高相邻帧匹配精度和稳定性,增强在多样化场景下三维重建的鲁棒性。
Objective
2
Nowadays, navigation and positioning technology has become an indispensable part of People's Daily life, and the satellite positioning system has been successfully built and widely used. However, in the environment where buildings are dense or satellites are blocked indoors, satellite positioning is inaccurate due to signal interference. Therefore, in order to overcome this difficulty, visual navigation and positioning technology came into being, which determines the position and attitude of the camera in three-dimensional space through image processing and computer vision technology, is an important means to solve the navigation and positioning problem in the satellite navigation rejection scene. Usually, visual matching navigation requires the pre-construction of three-dimensional point cloud information of the scene, and the acquisition of three-dimensional point cloud models can be mainly divided into three types: Manual model construction by mathematical modeling software, mapping by professional instruments and crowdsourcing mapping by consumer terminals are time-consuming and laborious in constructing large-scale feature point cloud databases, while visual modeling of video stream data based on consumer terminals has advantages such as low cost, convenient data update and wide spatial coverage. However, due to the large number of video frames and the image redundancy, the 3D model reconstruction calculation cost is high, the cumulative error is large and even the reconstruction failure is caused, so this paper proposes a 3D reconstruction key frame extraction method based on mutual check weighted optical flow.
Method
2
First of all, the image scene is pre-classified. In the process of video shooting, the camera passes through multiple scenes, and the video frame changes in different scenes. The pixel changes of the straight road are mainly distributed at the edge of the image, while the pixels of the whole image have changed at the turning point, so the video needs to be pre-classified. The system receives the self-collected video stream data, and combines the gyroscope data obtained by the mobile terminal to divide the scene into two types: straight road and turning road, which provides a basis for subsequent targeted optical flow aggregation adjustment and adjacent frame similarity calculation. Then, cross-check adjacent frame matching is carried out, the SIFT(Scale Invariant Feature Transform) algorithm is used to detect feature points and their descriptors in the previous frame image, the matching points in the second frame are calculated by FLANN(Fast Library for Approximate Nearest Neighbors) matching and pyramid LK(Lucas-Kanade) optical flow method respectively, and the feature points successfully matched by both methods are detected, and the wrong matching points are eliminated by calculating two-dimensional Euclidian distance. Strong matching point pairs were obtained to capture the dynamic changes between frames, so as to ensure the accuracy and effectiveness of the subsequent optical flow calculation. Finally, the total optical flow field is aggregated and the similarity of adjacent frames is calculated. Since there are differences in the variation of the video frames of the straight road and the turning road, the contribution of matching feature points of adjacent frames is also different in the similarity calculation, so it is necessary to carry out optical flow aggregation by weighting. After detecting the vanishing point of the image and taking it as the center, different weights are assigned to the strong matching point pairs near the vanishing point according to the scene classification, and then the aggregate information of the total optical flow field of adjacent frames is weighted to obtain the optical flow changes between frames. Then the images with significant motion changes are judged as key frames according to the set threshold, and the key frames of video stream are finally extracted.
Result
2
In the experiment, the self-developed video acquisition app of the research group in the consumer terminal was used to conduct data self-acquisition of the target environment. In different scenes with different lighting and ground feature distribution, four groups of data were obtained by using different traveling routes and speeds, and each group recorded the scene video information and gyroscope data synchronously. The proposed algorithm is compared with the traditional key frame extraction algorithm, and different algorithms are used to screen video key frames, count the number of extracted key frames, and use the structural similarity index (Structure Similarity Index Measure, SSIM) to calculate the number of highly similar frames in key frames. High similarity frames are the images with high visual similarity between the two frames in the extracted video key frames. The larger the number, the larger the redundancy of the extracted key frames. Then, the results of key frame extraction in the straight line and turning road are compared with the original video frame respectively, and the proportion of the number of key frames extracted in the original video frame is calculated in different scenes, so as to evaluate the scene adaptability of the algorithm. Finally, the key frames extracted by the algorithm are used for 3D model reconstruction experiment, and the road map is drawn with GNSS data, so as to evaluate the integrity of the reconstructed model. Experimental results show that the proposed algorithm can reduce the total number of video frames to about 10%, and the minimum can reach 4.56%. Meanwhile, the proportion of high similarity frames in key frames is less than 3%, and the minimum is 1.91%, which is significantly less than other algorithms. In addition, the number of key frames extracted by the algorithm in this paper is much larger than that in the straight road, which is better than other algorithms, and meets the expected demand for the number of images in different scenes under the application of 3D reconstruction. The integrity of the final model reconstruction is 100%, 100%, 97.46% and 96.54% on the 4 groups of data, which is obviously better than other algorithms.
Conclusion
2
This paper proposes a key frame extraction method for 3D reconstruction based on mutual check weighted optical flow, which can effectively reduce the number of video frames and improve the quality of key frame screening in diversified scenes. At the same time, the extracted key frames can improve the matching accuracy and stability of adjacent frames and enhance the robustness of 3D reconstruction.
Bao G B , Li D F and Mei Y L . 2020 . Key frames extraction based on optical-flow and mutual information entropy . Journal of Physics : Conference Series , 1646 ( 1 ): 012112 - [ DOI: 10.1088/1742-6596/1646/1/012112 http://dx.doi.org/10.1088/1742-6596/1646/1/012112 ]
Deshpande A , Bamnote V , Patil B and Tonge A A . 2018 . Review of keyframe extraction techniques for video summarization . International Journal of Computer Applications , 180 ( 39 ): 40 - 43 [ DOI: 10.5120/ijca2018917042 http://dx.doi.org/10.5120/ijca2018917042 ]
Dong J S . 2023 . Study on video key frame extraction in different scenes based on optical-flow . Journal of Physics : Conference Series , 2646 ( 1 ) [DOI: 10.1088/1742-6596/2646/1/012035]
Fang Z Y . 2022 . Research on rapid 3D reconstruction technology based on video keyframe screening . Sichuan : University of Electronic Science and Technology of China
方子赟 . 2022 . 基于视频关键帧筛选的快速三维重建技术研究 . 四川 : 电子科技大学
Hernández G A , Noriega M S , Torres-Argüelles V , Guaderrama A and Martínez G E . 2017 . Validity and reliability evaluation of a scale to measure the management of total productive maintenance . Indian Journal of Science and Technology , 10 ( 41 ) [DOI: 10.17485/ijst/2017/v10i41/167519]
Jekel C F and Venter G . 2019 . Pwlf: a python library for fitting 1 D continuous piecewise linear functions . [ DOI: 10.13140/RG.2. 2.28530.56007 http://dx.doi.org/10.13140/RG.2.2.28530.56007 ]
Li Z Q . 2024 . A method for recognising wrong actions of martial arts athletes based on keyframe extraction . International Journal of Biometrics , 16 ( 3-4 ): 256 - 271 [ DOI: 10.1504/IJBM. 2024.138228 http://dx.doi.org/10.1504/IJBM.2024.138228 ]
Lowe D G . 2004 . Distinctive image features from scale-invariant keypoints . International Journal of Computer Vision , 60 ( 2 ): 91 - 110 [ DOI: 10.1023/B:VISI.0000029664.99615.94 http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 ]
Mushan A and Vidap P S . 2020 . Video summarization using keyframe extraction methods . International Journal of Recent Technology and Engineering , 9 ( 2 ): 1030 - 1032 [ DOI: 10. 35940/ijrte.B4043.079220 http://dx.doi.org/10.35940/ijrte.B4043.079220 ]
Pandian A A and Maheswari S . 2023 . A keyframe selection for summarization of informative activities using clustering in surveillance videos . Multimedia Tools and Applications , 83 ( 3 ): 7021 - 7034 [ DOI: 10.1007/S11042-023-15859-Z http://dx.doi.org/10.1007/S11042-023-15859-Z ]
Paul M K A , Kavitha J and Rani P A J . 2018 . Key-Frame extraction techniques: a review . recent patents on computer science , 11 ( 1 ): 3 - 16 [ DOI: 10.2174/2213275911666180719111118 http://dx.doi.org/10.2174/2213275911666180719111118 ]
Singh P and Kushwaha A K S . 2024 . Key frame extraction algorithm for video summarization based on key frame extraction using sliding window. Multimedia Tools and Applications, 2024 (prepublish) : 1 - 20 [ DOI: 10.1007/S11042-024-20461-Y http://dx.doi.org/10.1007/S11042-024-20461-Y ]
Yan G and Woźniak M . 2022 . Accurate key frame extraction algorithm of video action for aerobics online teaching . Mobile Networks and Applications , 27 ( 3 ): 1 - 10 [ DOI: 10.1007/ s11036-022-01939-1 http://dx.doi.org/10.1007/s11036-022-01939-1 ]
Yan J . 2023 . Extraction of key frames from dance videos and movement recognition by multi-feature fusion . IEIE Transactions on Smart Processing & Computing , 12 ( 6 ) [DOI: 10. 5573/IEIESPC.2023.12.6.495]
Yang T , Li J , Yu J Y , Wang S B and Zhang Y N . 2015 . Diverse scene stitching from a large-scale aerial video dataset . Remote Sensing , 7 ( 6 ): 6932 - 6949 [ DOI: 10.3390/rs70606932 http://dx.doi.org/10.3390/rs70606932 ]
Yuan Y , Lu Z , Yang Z , Jian M , Wu L F , Li Z Y and Liu X . 2022 . Key frame extraction based on global motion statistics for team-sport videos . Multimedia Systems , 28 ( 2 ): 1 - 15 [ DOI: 10. 1007/s00530-021-00777-7 http://dx.doi.org/10.1007/s00530-021-00777-7 ]
Zarco-Tejada P J , Diaz-Varela R , Angileri V and Loudjani P . 2014 . Tree height quantification using very high resolution imagery acquired from an unmanned aerial vehicle (UAV) and automatic 3D photo-reconstruction methods . European Journal of Agronomy , 55 : 89 - 99 [ DOI: 10.1016/j.eja.2014.01.004 http://dx.doi.org/10.1016/j.eja.2014.01.004 ]
Zhao H , Wang W J , Wang T , Chang Z B and Zeng X Y . 2019 . Key-Frame extraction based on HSV histogram and adaptive clustering . Mathematical Problems in Engineering , 2019 : 1 - 10 [ DOI: 10.1155/2019/5217961 http://dx.doi.org/10.1155/2019/5217961 ]
Zhang H , Lu X P , Zhang X Q and Lu Z Z . 2020 . Dynamic extraction method of key frame image of UAV video for mine supervision . Remote Sensing Information , 35 ( 1 ): 112 - 116
张航 , 卢小平 , 张晓强 , 路泽忠 . 2020 . 面向矿山监管的无人机视频关键帧影像动态提取方法 . 遥感信息 , 35 ( 1 ): 112 - 116 [ DOI: 10.3969/j.issn.1000-3177.2020.01.015 http://dx.doi.org/10.3969/j.issn.1000-3177.2020.01.015 ]
Zhang M L . 2020 . Design and implementation of visual positioning system based on 3D reconstruction . Jiangsu : Nanjing University
张铭磊 . 2020 . 基于三维重建的视觉定位系统设计与实现 . 江苏 : 南京大学
Zhang X . 2022 . Research on image matching algorithm in indoor visual localization . Beijing : Beijing University of Posts and Telecommunications
张霄 . 2022 . 室内视觉定位中图像匹配算法研究 . 北京 : 北京邮电大学
Zheng Y J , Chen W W , Luo J X , Pan Z S , Zhang Y Y and Sun H X . 2023 . Adaptive step size video key frame extraction for 3D reconstruction . Software Guide , 22 ( 9 ): 159 - 166
郑义桀 , 陈卫卫 , 罗健欣 , 潘志松 , 张艳艳 , 孙海迅 . 2023 . 面向三维重建的自适应步长视频关键帧提取 . 软件导刊 , 22 ( 9 ): 159 - 166 [ DOI: 10. 11907/rjdk. 222214 http://dx.doi.org/10.11907/rjdk.222214 ]
相关作者
相关机构