近岸海浪视频浪高自动检测

宋巍; 周旭; 毕凡; 郭东琳; 高松; 贺琪; 白志鹏

发布时间： 2020-03-16
摘要点击次数： 2738
全文下载次数： 320
DOI: 10.11834/jig.190138
2020 | Volume 25 | Number 3

近岸海浪视频浪高自动检测

宋巍¹, 周旭¹, 毕凡^2,3, 郭东琳^2,3, 高松^2,3, 贺琪¹, 白志鹏⁴(1.上海海洋大学信息学院, 上海 201306;2.国家海洋局北海预报中心, 青岛 266061;3.山东省海洋生态环境与防灾减灾重点实验室, 青岛 266061;4.中国人民解放军61741部队)

摘要

目的目前基于视觉信息的海浪要素检测方法分为基于立体视觉和基于视频/图像特征的检测方法，前者对浪高的解析不稳定、模型复杂、鲁棒性较差、不能很好地满足实际应用的需求，后者主要检测海浪的运动方向和浪高等级，无法获取精确的浪高值，其中基于图像特征的检测受限于先验知识，检测稳定性较差。为此，本文结合深度学习的特征学习机制，提出了一种面向近岸海浪视频的浪高自动检测方法。方法从近岸海浪监控视频中提取视频帧图像，计算相邻两帧差分获取差分图像，通过数据预处理对静态图像集和差分图像集进行数据扩充；针对两类图像集分别设计多层局部感知卷积神经网络NIN（network in network）结构并预训练网络模型；分别用预训练的网络模型提取静态图像和差分图像的高层特征来表达空间和时间维度的信息，并融合两类特征；通过预训练支持向量回归SVR（support vactor regerssion）模型完成浪高的自动检测。结果实验结果表明，本文近岸海浪视频浪高检测方法在浪高检测上的平均绝对误差为0.109 5 m，平均相对误差为7.39%；从不同绝对误差范围内的测试集精度上可以看出，基于时间和空间信息融合的回归模型精度变化更加平稳，基于空间信息的NIN模型的精度变化幅度较大，因此本文方法有较好的检测稳定性。结论通过预训练卷积神经网络提取近岸视频图像时间和空间信息融合的方式，有效弥补了人工设计特征的不完备性，对近岸视频的浪高检测具有较强的鲁棒性，在业务化检测需求范围内（浪高平均相对误差≤ 20%）有着较好的实用性。

关键词

浪高检测近岸海浪视频深度学习多层局部感知卷积神经网络特征提取

Automatic wave height detection from nearshore wave videos

Song Wei¹, Zhou Xu¹, Bi Fan^2,3, Guo Donglin^2,3, Gao Song^2,3, He Qi¹, Bai Zhipeng⁴(1.College of Information, Shanghai Ocean University, Shanghai 201306, China;2.North China Sea Marine Forecasting Center of State Oceanic Administration, Qingdao 266061, China;3.Shandong Provincial Key Laboratory of Marine Ecological Environment and Disaster Prevention and Mitigation, Qingdao 266061, China;4.61741 PLA Troops, China)

Abstract

Objective Nearshore waves are significantly affected by seabed topography, shore, and environmental flows, following complex evolutionary laws with faster temporal and spatial transformations than open sea waves. Therefore, measuring nearshore wave height is significant for nearshore engineering design, shallow sea production operations, and nearshore environmental protection. The traditional wave height measure mainly relies on wave buoy monitoring. Compared with the traditional manner, nearshore video surveillance has advantages in uninterrupted data acquisition and abundant visual expression of waves. However, automatic wave height detection of nearshore waves through videos is insufficient at present. Existing methods of wave height detection based on visual information can be divided into two categories:1) Wave parameter detection based on stereo vision. Most models are complex, lacking robustness, and characterized by unstable detection of wave height; thus, they cannot satisfy the practical application. 2) Wave parameter detection based on image/video extracted features, including statistical features, transform domain features, and texture features. This type of method is mainly used to detect the direction of wave and wave height, which necessitates design features in advance, and is thereby limited by prior knowledge. In recent years, deep learning has achieved considerable success in image identification, nature language processing, and object recognition. Combined with the feature learning mechanism of deep learning, an automatic wave height detection method for nearshore wave video is proposed in this paper. Method The proposed method mainly involves data preprocessing, model design and feature fusion, and regression prediction. First, the video frames are extracted from the nearshore surveillance video at intervals, and the two adjacent frames are subtracted to form a set of differential frames. The dataset of original video frames contains static spatial information of waves and the dataset of differential images contains motion information of waves. To avoid the influence of reefs and buildings on wave feature extraction, we intercepted the wave area in the video by eliminating near-zero parts in the differential image. To enhance the generalization ability and robustness of the model, we used the data augment method to rotate and stretch the image to increase the number and diversity of datasets. Second, a network in networks (NIN)-based system for wave height detection was constructed. The high-level spatial and temporal features are learned by two independent NINs using static and differential images of the waves as input. The 4-layer structure of NIN is used for the spatial feature learning, while the 2-layer structure is used for the temporal feature learning. The two types of features are fused by simple concatenation because pixel-wise fusion may bring mutual interference and information loss. Finally, the fused features are fed into a support vector regression (SVR) model that maps features into 1D space and performs regression to achieve automatic detection of wave height of nearshore video images. Result Our wave videos were collected from a marine station in China Sea from November 2015 to November 2016. The shooting time ranged from 7 a.m. to 4 p.m. To explore the performance of the different network models and the effect of the sample size of dataset on the wave height detection results, we conducted two sets of experiments. Experiment 1:We compared our NIN-based wave height detection model with a classic 2-layer convolutional neural network and a more advanced dense convolutional network (DenseNet). Based on the root-mean-square-error (RMSE) between predicted wave heights and the ground truth as the assessment index, the comparison results show that NIN-based network model can achieve more accurate wave height prediction with RMSE of 0.188 4. Experiment 2:To select the appropriate network input size, the wave height detection models with different sample sizes were trained and their performance was compared on test datasets under different tolerant ranges of absolute error from 0.2-0.4. The result shows that the input sample size of 32×32 pixels has the highest accuracy under the condition of absolute error<0.2 and has a relatively stable change as the change of absolute error ranges. In consideration of the integrity of image feature information expression and noise interference, the experimental data used in this study was set to the uniform size of 32×32 pixels. Experiment 3:The roles of temporal and spatial feature fusions were examined. The high-level spatial features were learned from static video frames and the temporal features from the difference image between two adjacent frames. Compared with only using spatial features, fusing spatial features with temporal features achieved a significant increase in wave detection accuracy under various tolerance ranges of absolute error, and the detected wave height is less fluctuating. The average absolute error of the method in the detection of wave height is 0.109 5 m, and the average relative error is 7.39%. According to the wave height levels, the wave height detected by our proposed method can satisfy the absolute error range of±0.1 m below the wave level 2. The average relative error of the wave height above level 2 is less than 20%, which satisfies the demand for operational use of wave forecasting. Conclusion The method can be used to automatically obtain wave height value from nearshore wave videos, which effectively compensates for the incompleteness of artificial design features. Moreover, our method has better practicality within the scope of operational detection requirements; thus, it has an average relative error of wave height ≤ 20%. Our method can meet the requirements for accuracy and efficiency in significant wave height detection, and provides a new platform to use nearshore videos for wave monitoring.

Keywords

wave height detection nearshore wave video deep learning network in network architecture feature extraction

在线采编平台

在线出版

年度会议

下载中心

年度信息