连续图卷积视频烟雾检测模型

杨龙箴; 袁非牛; 杨寿渊; 雷帮军; 张相芬

发布时间： 2019-10-12
摘要点击次数： 3584
全文下载次数： 522
DOI: 10.11834/jig.190232
2019 | Volume 24 | Number 10

连续图卷积视频烟雾检测模型

杨龙箴¹, 袁非牛^1,2, 杨寿渊¹, 雷帮军³, 张相芬²(1.江西财经大学信息管理学院, 南昌 330032;2.上海师范大学信息与机电工程学院, 上海 201418;3.三峡大学水电工程智能视觉监测湖北省重点实验室, 宜昌 443002)

摘要

目的视频烟雾检测在火灾预警中起到重要作用，目前基于视频的烟雾检测方法主要利用结构化模型提取烟雾区域的静态和动态特征，在时间和空间上对烟雾信息作同等或相似处理，忽略了视频数据在时间线上的连续性和特征的非结构化关系。图卷积网络（GCN）与神经常微分方程（ODE）在非欧氏结构与连续模型处理上具有突出优势，因此将二者结合提出了一种基于视频流和连续时间域的图烟雾检测模型。方法目前主流的视频烟雾检测模型仍以离散模型为基础，以规则形式提取数据特征，利用ODE网络构建连续时间模型，捕捉视频帧间的隐藏信息，将原本固定时间跨度的视频帧作为连续时间轴上的样本点，充分利用模型的预测功能，补充帧间丢失信息并对未来帧进行一定程度的模拟预测，生成视频帧的特征并交给图卷积网络对其重新建模，最后使用全监督和弱监督两种方法对特征进行分类。结果分别在2个视频和4个图像数据集上进行训练与测试，并与最新的主流深度方法进行了比较，在KMU （Korea Maritime University）视频数据集中，相比于性能第2的模型，平均正样本正确率（ATPR值）提高了0.6%；在2个图像数据集中，相比于性能第2的模型，正确率分别提高了0.21%和0.06%，检测率分别提升了0.54%和0.28%，在视频单帧图像集上正确率高于第2名0.88%。同时也在Bilkent数据集中进行了对比实验，以验证连续隐态模型在烟雾动态和起烟点预测上的有效性，对比实验结果表明所提连续模型能够有效预测烟雾动态并推测烟雾起烟点位置。结论提出的连续图卷积模型，综合了结构化与非结构化模型的优势，能够获得烟雾动态信息，有效推测烟雾起烟点位置，使烟雾检测结果更加准确。

关键词

视频烟雾检测烟雾识别图卷积网络神经常微分方程度量学习弱监督学习

Continuous graph convolutional model for video smoke detection

Yang Longzhen¹, Yuan Feiniu^1,2, Yang Shouyuan¹, Lei Bangjun³, Zhang Xiangfen²(1.School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330032, China;2.College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China;3.Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang 443002, China)

Abstract

Objective Video smoke detection plays an important role in real-time fire alarms by solving the limitation of applying sensors in large spaces, outdoors, and other types of environment with strong air turbulence. Current video-based methods mainly extract the static and dynamic features of smoke and process them with the same structured model, which may disregard its continuous information and unstructured feature properties. Graph convolutional networks (GCNs) and neural ordinary differential equations (ODEs) exhibit powerful strength on processing non-Euclidean structures and continuous timeline models. Therefore, these methods can ultimately be utilized in video smoke detection. On account of the success of these new methods, we propose a flow-based continuous graph model for video smoke detection. Method In this study, we constructed a continuous timeline model using a neural ODE network, while most methods in the video smoke detection domain remain focused on discrete spatial-temporal features in the Euclidean space. We considered video frames with fixed time spans as sample points on a continuous timeline. By simulating the latent space of hidden variables through the latent time series model, we could obtain the hidden information between frames, which may be disregarded in discrete models. When the model was established, the lost between-frames information and the short-term future frame can be predicted. Through this procedure, we could effectively advance fire alarms. For the detection functions, we used GCNs to extract the feature of the video frame (or block), which will be trained for classification by utilizing fully and weakly supervised methods. Considering the lack of smoke labels for bounding boxes or pixel-level ground truths in real smoke video datasets, we pretrained our model on a number of smoke images and used it to predict the label of sliding windows in a labeled video frame. This process was conducted to find the origin fire point or predict the motion information of smoke. Result We compare our model with seven state-of-the-art models of video smoke detection and five image-detection models, including the traditional approaches and deep-learning methods on two video and four image datasets. The video data are collected from KMU, Bilkent, USTC, and Yuan. The quantitative evaluation metrics contain detection rate (DR), accuracy rate (AR), false alarm rate, average true positive rate (ATPR), average false positive rate, average true negative rate, and F-measure (F2). We provide several latent models of each method for comparisons. Experimental results show that our model outperforms most of other methods in KMU and the Yuan datasets. The visualized detection samples show that our model can capture the dynamic motion feature of smoke and predict the origin fire point by combining these features. Comparative experiments demonstrate that the continuous model improves smoke detection accuracy. Compared with the 3D parallel convolutional network and other results in the KMU video dataset, ATPRs increase by 0.6%. Compared with DMCNN and other results in the Yuan image datasets, obtained ARs increase by 0.21% and 0.06% on image datasets, respectively. Although the results of the state-of-art models are over 98%, we also achieve DR increases by 0.54% and 0.28%. In addition, we conduct a series of experiments in the Bilkent video datasets to verify the effectiveness and robustness of our latent model on the prediction of smoke motion. As shown in the separated screenshots of the real smoke videos, we initially sample several frames randomly and slide the bounding box window to divide the image block and predict their labels using our continuous graph convolutional model. We use the pretrained model given that the real smoke videos do not have specific labels for bounding boxes or pixels. Thereafter, we feed the center point of these samples to our latent model and predict the labels of the bounding boxes in the current image. Through visualizing the smoke areas detected by our model, we find that our latent model correctly tracks the diffusion direction of smoke and updates its locations. By reversing the timeline fed to the latent model, we can obtain the trajectory of smoke fusion back to its origin point. Therefore, the effectiveness of our latent model to predict smoke motion and infer the origin fire point is demonstrated. However, a quantized verification has not been conducted yet. Conclusion In this study, we propose a video-based continuous graph convolutional model that combines the strength of structured and unstructured models. We also capture the dynamic information of smoke and effectively predict the origin fire point. Experiment results show that our model outperforms several state-of-the-art approaches of video and image smoke detection.

Keywords

video smoke detection smoke recognition graph convolutional network(GCN) neural ordinary differential equations metric learning weakly supervised learning