融合视觉关系检测的电力场景自动危险预警

高明; 左红群; 柏帆; 田清阳; 葛志峰; 董兴宁; 甘甜

发布时间： 2021-07-16
摘要点击次数： 1688
全文下载次数： 1034
DOI: 10.11834/jig.200502
2021 | Volume 26 | Number 7

融合视觉关系检测的电力场景自动危险预警

高明¹, 左红群¹, 柏帆², 田清阳², 葛志峰¹, 董兴宁³, 甘甜³(1.国网浙江宁海县供电有限公司, 宁波 315600;2.宁海县雁苍山电力建设有限公司, 宁波 315600;3.山东大学计算机科学与技术学院, 青岛 266237)

摘要

目的借助深度学习强大的识别与检测能力，辅助人工进行电力场景下的危险描述与作业预警是一种较为经济和高效的电力安全监管手段。然而，目前主流的以目标检测技术为基础的预警系统只能给出部分危险目标的信息，忽视了电力设备的单目危险关系和成对对象间潜在的二元危险关系。不同于以往的方法，为了拓展危险预警模块的识别能力与功能范畴，本文提出了一种在电力场景下基于视觉关系检测的自动危险预警描述生成方法。方法对给定的待检测图像，通过目标检测模块得到图中对象的类别名称和限界框位置；分别对图像进行语义特征、视觉特征和空间位置特征的抽取，将融合后的总特征送入关系检测模块，输出单个对象的一元关系和成对对象间的关系三元组；根据检测出的对象类别和关系信息，进行危险预测并给出警示描述。结果本文自主搜集了多场景下的电力生产作业图像并进行标注，同时进行大量消融实验。实验显示，结合了语义特征、空间特征和视觉特征的关系检测器在前5召回率Recall@5和前10召回率Recall@10上的精度分别达到86.80%和93.93%，比仅使用视觉特征的关系检测器的性能提高约15%。结论本文提出的融合多模态特征输入的视觉关系检测网络能够较好地给出谓词关系的最佳匹配，并减少不合理的关系预测，且具有一定零样本学习（zero-shot learning）能力。相关可视化结果表明，整体系统能够较好地完成电力场景下的危险预警描述任务。

关键词

危险预警目标检测视觉关系检测多模态特征融合多标签余量损失

Visual relationship detection-based emergency early-warning description generation in electric power industry

Gao Ming¹, Zuo Hongqun¹, Bai Fan², Tian Qingyang², Ge Zhifeng¹, Dong Xingning³, Gan Tian³(1.State Grid Ninghai Power Supply Company, Ningbo 315600, China;2.Ninghai Yancang Mountain Electric Power Construction Company, Ningbo 315600, China;3.School of Computer Science and Technology, Shandong University, Qingdao 266237, China)

Abstract

Objective The past decade has seen a steady increase in deep learning areas, where extensive research has been published to improve the learning capabilities of deep neural networks. Thus, a growing number of regulators in the electric power industry utilize such deep learning techniques with powerful recognition and detection capabilities to build their surveillance systems, which greatly reduce the risk of major accidents in daily work. However, most of the current early-warning systems are based on object detection technologies, which can only provide annotations of dangerous targets within the image, ignoring the significant information about unary relationships of electrical equipment and binary relationships between paired objects. This condition limits the capabilities of emergency recognition and forewarning. With the presence of powerful object detectors such as Faster region convolutional neural network (R-CNN) and huge visual datasets such as visual genome, visual relationship detection has attracted much attention in recent years. By utilizing the basic building blocks for single-object detection and understanding, visual relationship detection aims to not only accurately localize a pair of objects but also precisely determine the predicate between them. As a mid-level learning task, visual relationship detection can capture the detailed semantics of visual scenes by explicitly modeling objects along with their relationships with other objects. This approach bridges the gap between low-level visual tasks and high-level vision-language tasks, as well as helps machines to solve more challenging visual tasks such as image captioning, visual question answering, and image generation. However, the difficulty is in developing robust algorithms to recognize relationships between paired objects with challenging factors, such as highly diverse visual features in the same predicate category, incomplete annotation and long-tailed distribution in the dataset, and optimum predicate matching problem. Although numerous methods have been proposed to build efficient relationship detectors, few of them concentrate on applying detection technologies to actual use. Method Different from existing methods, our method introduces the visual relationship detection technology into current early-warning systems. Specifically, our method not only identifies dangerous objects but also recognizes the potential unary or binary relationships that may cause an accident. To sum up, we propose a two-stage emergency recognition and forewarning system for the electric power industry. The system consists of a pre-trained object-detection module and a relationship detection module. The pipeline of our system mainly includes three stages. First, we train an object-detection module based on Faster R-CNN in advance. When given an image, the pre-trained object detector localizes all the object bounding boxes and annotates their categories. Then, the relationship-detection module integrates multiple cues (visual appearance, spatial location, and semantic embedding) to compute the predicate confidence of all the object pairs, and output the top instances as the relationship predictions. Finally, based on the targets and relationship information provided by the detectors, our system performs emergency prediction and generates a warning description that may help regulators in the electric power industry to make suitable decisions. Result We conduct several experiments to prove the efficiency and superiority of our method. First, we collect and build a dataset consisting of large amounts of images from multiple scenarios in the electric power industry. Using instructions from experts, we define and label the relationship categories that may pose risks to the images in the dataset. Then, according to the number of objects forming a relationship, we divide the dataset into two parts. Thus, our experiments involve two relevant tasks to evaluate the proposed method: unary relationship detection and binary relationship detection. For the unary relationship detection, we use precision and recall as thee valuation metrics. For the binary relationship detection, the evaluation metrics are Recall@5 and Recall@10. As our proposed relationship-detection module contains multiple cues to learn the holistic representation of a relationship instance, we conduct ablation experiments to explore their influence on the final performance. Experiment results show that the detector that uses visual, spatial, and semantic features as input achieve the best performance of 86.80% in Recall@5 and 93.93% in Recall@10. Conclusion Extensive experiments show that our proposed method is efficient and effective in detecting defective electrical equipment and dangerous relationships between paired objects. Moreover, we formulate a pre-defined rule to generate the early-warning description according to the results of the object and relationship detectors. All of the proposed methods can help regulators take proper and timely actions to avoid harmful accidents in the electric power industry.

Keywords

emergency early-warning object detection visual relationship detection multimodal feature fusion multi-label margin loss

在线采编平台

在线出版

年度会议

下载中心

年度信息