Conference PaperPDF Available

Fire Detection based on Convolutional Neural Networks with Channel Attention

Authors:
3080978-1-7281-7687-1/20/$31.00 2020 IEEE
Fire Detection based on Convolutional Neural
Networks with Channel Attention
Xiaobo Zhang
School of Automation
Southeast University
Nanjing 210096, China
zxb852@sina.com
Kun Qian
School of Automation, Southeast University,
Nanjing 210096, China
Key Laboratory of Measurement and
Control of Complex Systems of Engineering,
Ministry of Education of China
kqian@seu.edu.cn
Kaihe Jing
School of Automation
Southeast University
Nanjing 210096, China
928104929@qq.com
Jianwei Yang
Future Science & Technology Park
Changping District,
Beijing, China, 102209
hvdcyjw@sina.com
Hai Yu
Global Energy Interconnection Research
Institute, State Grid
Nanjing, China, 210000
83616883@qq.com
Abstract—The existing research on fire detection is basically
based on a two-stage method, resulting in slower detection speed,
and the positioning accuracy is limited by the first-stage candidate
region extraction algorithm. In order to solve the problem of high-
precision real-time fire detection, this paper proposes a Yolo
detection network combined with attention mechanism. The
attention module is serially added to the final three convolutional
networks of different scales of Yolo v3. The channel attention
module updates the feature map by weighting and summing all
channels, which captures the semantic dependencies between
channels in the deep layers of the network, and improves the
generalization ability of the model. Experiments show that the
method proposed in this paper improves the accuracy of fire
detection without reducing the detection speed.
Keywords—Image recognition; fire detection; Deep learning;
Attention mechanism; Yolo
I. INTRODUCTION
There are flammable and explosive materials in typical
industrial production scenes such as power plants. Once a fire
occurs, it will bring huge damage to industrial production and
personnel safety. In addition, there may be multiple operating
devices in the production environment. Better response
measures can be taken for the location of the burning device.
Therefore, real-time fire detection is very necessary.
Fire detection methods can be divided into two categories: 1)
traditional fire detection and 2) visual-based fire detection.
Traditional fire detection is generally based on various types of
sensors, including smoke, temperature and photosensitive
sensors. Traditional fire detection is a close-range detection and
can only monitor a certain range of space, and is not suitable for
open space. And it is impossible to locate the specific position
of the flame.
Computer vision and deep learning technologies have
developed rapidly in recent years, the algorithm based on image
processing can effectively overcome the shortcomings of
traditional fire detection. The image has a wider field of view
and a longer detection range than traditional sensors. The image
contains more information, including the position of the flame.
Surveillance cameras are very common in various production
scene and are inexpensive. Image-based fire detection is also
divided into traditional methods and deep learning methods.
Among various deep learning methods, most of the current
researches are focused on image-level fire detection, rather than
region-level fire detection. However, in typical power plant
environment such as control rooms, substations, detecting and
localizing flame regions in images is essential for identifying the
burning equipment.
Yolo v3 is a recently proposed target detection network with
excellent performance. This paper proposes a method that
combines the channel attention mechanism with the Yolo v3
network, and uses the channel attention module to capture the
deep semantic feature dependency of Yolo. Experiments have
proved that the method improves the accuracy of fire detection
in multi-scale scenes and verifies the effectiveness of the method,
without reducing the detection speed.
II. RELATED WORK
Image processing algorithm provides an effective solution to
the problems existing in traditional fire detection. Fire detection
based on image processing is divided into traditional methods
and deep learning methods. Traditional recognition algorithms
can be summarized into three stages: 1. Flame pixel
classification 2. Motion detection 3. Candidate region feature
analysis. Flame pixel classification is mostly to establish static
color models in various color spaces, which are responsible for
generating flame candidate areas; motion detection is
responsible for detecting the dynamic feature; candidate area
feature analysis is responsible for comprehensive analysis of
previous data and realizing fire detection through conditional
filtering. Flame pixel classification and motion detection are
based on each other in different methods. Celik et al. [9] used
manually marked flame masks to extract flame pixel values for
subsequent color model research. Celik et al. [1] proposed a
This work is supported by the National Natural Science Foundation of
China (Grant No. 61573101), and the Science and Technology Program of
Global Energy Interconnection Research Institute "Research on
infrared/ultraviolet image recognition algorithm and software modules
development"
2020 Chinese Automation Congress (CAC) | 978-1-7281-7687-1/20/$31.00 ©2020 IEEE | DOI: 10.1109/CAC51589.2020.9327309
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
3081
flame color model represented by geometric shapes in RGB
space.
In recent years, the method based on deep learning has
become the mainstream method of fire detection. At present,
most researches are to identify whether there is flame in the
image, but there are few researchers on flame location.
Regarding the research of fire detection, Dunnings et al. [7] use
superpixel segmentation to divide the picture into regions, and
then perform fire detection on each segmented block
respectively, and finally take the union of the detected regions
to obtain the segmented regions. The positioning accuracy of
this two-stage method is poor, and the process of superpixel
segmentation is very time-consuming. Using the traditional
algorithm of the color model, Zhong et al.[4] proposed a two-
stage algorithm combining the generation of candidate regions
and the classification network.But the candidate regions was not
used to locate the flame area, and the output is only a binary
classification of the image. Zhao et al. [14] used the saliency
detection method to extract the suspected fire area, calculated
the color and texture features of the ROI, and then used two
logistic regression classifiers to classify the feature vector of the
ROI, which is also a two-stage detection Methods with
positioning accuracy and detection speed limited.
Yolo v3 is a recently-appearing detection network with
excellent performance. It converts the border positioning
problem into a regression problem. It has high detection
accuracy and high speed, and is very suitable for real-time target
detection. The attention mechanism was first used in machine
translation, and it has been applied to convolutional networks for
image processing in recent years, deriving good results. Fu et al.
[6] proposed a dual attention module of space and channel,
which can capture the spatial relationship and channel
relationship of deep semantics. Wu et al. [2] introduced the
channel attention mechanism into the flame classification
network, and used the attention mechanism to learn the
nonlinear interaction between channels, which improved the
accuracy of flame type classification. This paper combines Yolo
v3 with the attention mechanism to achieve accurate and
efficient fire detection.
III. BUILDING FLAME DATASET BY RGB-T AUTOMATIC
ANNOTATION
At present, most flame datasets on the Internet only provide
binary labels labels rather than bounding box labels required for
fire detection. Therefore, this paper constructs a flame dataset.
In order to obtain sample labels accurately and quickly, our
previous work of RGB-T image registration was used. The
RGB-IR camera shown in Fig. 1 was used to collect flame
samples, and the flames in multi-scale scenes were collected
with paired RGB-T samples. A total of 5 videos were collected.
The RGB images is used for the input of the network, and the
infrared images is used for the automatic annotation algorithm
to automatically generate sample labels. In order to accurately
segment the flame mask in the infrared image, we keep the
relative position and viewing angle between the camera and the
flame fixed, and minimize the background interference in the
infrared image.
RGB image
channel
thermal image
channel
camera tripod
Fig. 1. RGB-IR camera
The process of automatic annotation as shown in Fig. 2 based
on our previous work of RGB-T image registration. The mutual
information method is used to calculate the transformation
relation between a pair of infrared and RGB images so as to
realize the automatic annotation. [18] The flame samples of the
RGB images and infrared images are registered to obtain the
transformation matrix M from the visible pixel coordinate
system to the infrared pixel coordinate system.
ܯ=ܴܶ
01
ܺ
ܻ
1=ܴܶ
01ቃ൥ܺ
ܻ
1
First, we use the transformation matrix to transform the RGB
images as shown in (2), and then crop the infrared field of view
and generate pixel-aligned pairs of RGB-T samples. An
example is shown in Fig. 3.
Fig. 2. Automatic annotation process
A sample video only needs to be registered once, and the
transformation matrix is applicable to all frames of this sample
video. Because the pixels in the flame area have higher
brightness in the infrared image, simple image processing
methods can be used to obtain the mask of the flame in the
infrared image. Since the infrared image and the RGB image are
RGB
image
Infrared
image
rgb-t image
registr ation
algorithm
Flame mask
under infrared
image
Flame mask
under visible
image
Sample self -
labeling
Detection
network
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
3082
pixel-aligned, the mask of the flame in the infrared image is the
same as the mask of the flame in the RGB image. Finally, take
the minimum bounding rectangle to obtain the sample label.
×
+crop
RGB image
infrared image
pixel-aligned pairs
of RGB-T sample
Merge
Fig. 3. Example of generating pixel-aligned pairs of RGB-T samples
Fig. 4 shows some examples of automatic annotation. The
left column is the infrared images, the middle column is the
flame masks extracted by the image processing algorithm, and
the right column is the generated sample labels.
Fig. 4. Examples of automatic annotation
In order to expand the diversity of the sample set, 10 flame
videos were downloaded from the Internet, including flame
samples in different scenes such as indoor and outdoor, long and
short distance, day and night, and manually labeled to obtain
labels. Finally, the original data set was expanded by the method
of data enhancement, and the final flame data set contains 6000
samples.
IV. PROPOSED METHOD
The proposed method combines the Yolo v3 network with
the attention mechanism. The network structure is shown in Fig.
5. The attention module is serially added to the final three
convolutional networks on different scales. The structure of the
attention module refers to DANet[6]. The deep layer of the
network mainly extracts semantic information, in which each
channel is equivalent to a certain type of semantic response. The
channel attention module is used to capture the semantic
dependencies between channels and encode channel context
information into local features. The input is the original feature
maps, and the dependency between each feature map is used as
the weight to update the feature maps. The updated feature map
is obtained by weighting and summing all the feature maps, so
as to capture the semantic dependency between each channel
and strengthen the expression ability of semantic features.
The channel attention mechanism is shown in Fig. 6, where
the scale of the input feature maps A is C×H×W. First, perform
a reshape operation on A to obtain a C×N scale feature map,
where N=H×W is the number of pixels in each layer of the
feature maps, and then reshape and transpose A to obtain a N×C
scale feature map . The result of multiplying the two feature
maps is calculated through the softmax layer to obtain the
channel attention matrix X (C×C), and each element of the X
matrix is:
ݔ௝௜ =௘௫௣ ( ή஺)
σ௘௫௣ ( ή஺)
೔సభ 
The next step is to reshape the original input feature maps A
to a scale of C×N, multiply it by the transposition of X, and
reshape the result of the operation to a scale of C×H×W, which
is the same scale as the input feature. Finally, multiply the result
by a scale factor and add the original feature maps to get the final
output E (C×H×W):
ܧ=ߚσ൫ݔ௝௜ ήܣ+ܣ
௜ୀଵ 
7KH VFDOH IDFWRU ȕ LV WKH RQO\ WUDLQLQJ SDUDPHWHU LQ Whe
channel attention module, which is initialized to 0 and is
continuously assigned weights during the training process. X is
the channel attention matrix, each row of which represents the
weighting relationship between the channels. A value in a row
indicates the weight of the corresponding layer. The final output
feature E has the same scale as the input feature A, and the
feature map of each channel is the weighted sum of each channel
of the original input A. The channel attention module captures
the relevance of different layers of semantic features, improves
the ability to express semantic features, and models the semantic
dependence between different channels.
Fig. 6. Channel attention module
UHVKDSHUHVKDSH
UHVKDSH VRIWPD[
UHVKDSH
WUDQVSRVH
$
;
(
uu&+: uu&+:
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
3083
Input
92 la yers
416×416×3
60 la yers
92
l
ay
ers
60
l
ayers
Darknet-53 with out FC layers
32 la yers
13×13 ×255
26×26 ×255
52×52 ×255
Multi-scale output
13×13 Atte ntion
module
Concat
Concat
26×26 Atte ntion
module
52×52 Atte ntion
module
Upsampling
Upsampling
Fig. 5. Yolo v3 detection network combined with attention mechanism
V. EXPERIMENT AND EVALUATION
The test was conducted on a computer with Intel Core i7-
9700K CPU @ 3.60GHz, NVIDIA 2080ti GPU, 16GB RAM,
Ubuntu 16.04. The detection performance of the network was
tested on two test sets. The first test set was generated using the
hold-out method. The original dataset was randomly divided
into a train set and a test set according to the proportion, and the
test set contains 600 samples. The distribution of the first test set
is close to the train set. Yolo v3 and the method proposed in this
paper (Yolo v3-SA) were tested on the first test set. The
calculation method of Average Precision (AP) in Pascal VOC
was adopted. When the IOU is 0.8, the AP of Yolo v3 is 66.3%,
and the AP of Yolo v3-SA proposed in this paper is 66.77%,
which is a slight increase. The channel attention mechanism
captures the semantic dependencies between channels by
weighting and summing the channels, but it also blurs the
original semantic features. As a result, the improvement is not
obvious on the first test set whose distribution is close to the
training set.
During the RGB-T registration, the RGB image channel was
transformed and cropped, so part of the field of view was lost,
resulting in a difference in the distribution of the original flame
samples. In order to test the detection performance of the model
on samples with different distributions, 250 new test samples
were generated by manually labeling the original flame samples.
The first test set was expanded with these new samples, and the
new test set contains 850 test samples, which is called the
generalized test set.
First, the detection speed was tested, and the results are
shown in Table 1. The experiment shows that the method
proposed in this paper reduces the detection speed very little,
and the algorithm is real-time.
TABLE I. DETECTION SPEED
algorithm Yolo v3 Yolo v3-SA
Frames Per Second 22.14 21.90
The two models were tested on the generalized test set. Table
2 lists the AP of the two models under different IOU standards.
The precision-recall curve, when the IOU is 0.75, is drawn as
shown in Fig. 7. The accuracy of the method proposed in this
paper on the generalized test set is 4.57% higher than that of
Yolo v3. Fig. 9 shows some test examples
TABLE II. DETECTION AVERAGE PRECISION
algorithm Average Precision
0.85 IOU 0.80 IOU 0.75 IOU 0.70 IOU 0.65 IOU
Yolo v3 28.99% 52.88% 69.31% 80.04% 85.79%
Yolo v3-SA 29.57% 55.86% 73.88% 86.51% 91.74%
(a) Yolo v3 (b) Yolo v3-SA
Fig. 7. Precision-Recall curve of 0.75 IOU
Finally, additional supplementary experiments were
performed on the public dataset voc2007 to verify the
effectiveness of the method in this paper. The mAP results are
shown in Fig. 8. The results show that the mean Average
Precision of the method proposed in this paper is improved by
0.54% compared with Yolo v3.
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
3084
(a) Yolo v3 (b) Yolo v3-SA
Fig. 8. Supplementary experimental test results on the voc2007 dataset
Fig. 9. Examples of Yolo v3-SA fire detection
VI. CONCLUSION
In order to solve the problem of high-precision real-time fire
detection, this paper proposes a Yolo detection network
combined with an attention mechanism. The flame dataset is
generated using automatic annotation algorithm and data
enhancement. Combining the attention mechanism with the
Yolo v3 network, the channel attention module captures the
semantic dependencies between channels. Experiments show
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
3085
that the method proposed in this paper improves the accuracy of
fire detection without reducing the detection speed. The next
step will be to study the lack of flame samples in some actual
production scene to further improve the generalization ability of
the model.
REFERENCES
[1] T. Celik, H. Demirel, and H. Ozkaramanli, "Automatic fire detection in
video sequences," in 2006 14th European Signal Processing Conference,
2006: IEEE, pp. 1-5.
[2] Y. Wu, Y. He, P. Shivakumara, Z. Li, H. Guo, and T. Lu, "Channel-wise
attention model-based fire and rating level detection in video," CAAI
Transactions on Intelligence Technology, vol. 4, no. 2, pp. 117-121, 2019.
[3] %87|UH\LQ <'HGHR÷OX 8*GNED\DQG$( &HWLQ&RPSXWHU
vision based method for real-time fire and flame detection," Pattern
recognition letters, vol. 27, no. 1, pp. 49-58, 2006.
[4] Z. Zhong, M. Wang, Y. Shi, and W. Gao, "A convolutional neural
network-based flame detection method in video sequence," Signal, Image
and Video Processing, vol. 12, no. 8, pp. 1619-1627, 2018.
[5] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S. W. Baik,
"Convolutional neural networks based fire detection in surveillance
videos," IEEE Access, vol. 6, pp. 18174-18183, 2018.
[6] J. Fu et al., "Dual attention network for scene segmentation," in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2019, pp. 3146-3154.
[7] A. J. Dunnings and T. P. Breckon, "Experimentally defined convolutional
neural network architecture variants for non-temporal real-time fire
detection," in 2018 25th IEEE International Conference on Image
Processing (ICIP), 2018: IEEE, pp. 1558-1562.
[8] X. Wang, A. Shrivastava, and A. Gupta, "A-fast-rcnn: Hard positive
generation via adversary for object detection," in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017, pp. 2606-
2615.
[9] T. Celik and H. Demirel, "Fire detection in video sequences using a
generic color model," Fire safety journal, vol. 44, no. 2, pp. 147-158, 2009.
[10] B. U. Toreyin, Y. Dedeoglu, and A. E. Cetin, "Flame detection in video
using hidden Markov models," in IEEE International Conference on
Image Processing 2005, 2005, vol. 2: IEEE, pp. II-1230.
[11] W. Phillips Iii, M. Shah, and N. da Vitoria Lobo, "Flame recognition in
video," Pattern recognition letters, vol. 23, no. 1-3, pp. 319-327, 2002.
[12] Y. Ren, C. Zhu, and S. Xiao, "Object detection based on fast/faster RCNN
employing fully convolutional architectures," Mathematical Problems in
Engineering, vol. 2018, 2018.
[13] C. Hu, P. Tang, W. Jin, Z. He, and W. Li, "Real-time fire detection based
on deep convolutional long-recurrent networks and optical flow method,"
in 2018 37th Chinese Control Conference (CCC), 2018: IEEE, pp. 9061-
9066.
[14] Y. Zhao, J. Ma, X. Li, and J. Zhang, "Saliency detection and deep
learning-based wildfire identification in UAV imagery," Sensors, vol. 18,
no. 3, p. 712, 2018.
[15] B. Kim and J. Lee, "A video-based fire detection using deep learning
models," Applied Sciences, vol. 9, no. 14, p. 2862, 2019.
[16] C. B. Liu and N. Ahuja, "Vision based fire detection," in Proceedings of
the 17th International Conference on Pattern Recognition, 2004. ICPR
2004., 2004, vol. 4: IEEE, pp. 134-137.
[17] Q. X. Zhang, G. H. Lin, Y. M. Zhang, G. Xu, and J. I. Wang, "Wildland
forest fire smoke detection based on faster R-CNN using synthetic smoke
images," Procedia engineering, vol. 211, pp. 441-446, 2018.
[18] J. Ma, K. Qian, X. Zhang, and X. Ma, "Weakly Supervised Instance
Segmentation of Electrical Equipment Based on RGB-T Automatic
Annotation," IEEE Transactions on Instrumentation and Measurement,
2020, DOI:10.1109/TIM.2020.3001796
Authorized licensed use limited to: Southeast University. Downloaded on March 24,2021 at 00:05:59 UTC from IEEE Xplore. Restrictions apply.
... [25] use a squeeze and excitation block for channel attention, and Woo et al. [26] add spatial awareness to improve [25] even further. Adding channel attention in [27] and spatial awareness in [28] have shown promising results for large-scale fires, but the performance of these models still needs to be improved for small fires after a certain depth. More recently, based on the vision transformer [29], fire detection was proposed to determine how to use the transformer framework to capture the global view of the image [30]. ...
Article
Full-text available
Fires are becoming one of the major natural hazards that threaten the ecology, economy, human life and even more worldwide. Therefore, early fire detection systems are crucial to prevent fires from spreading out of control and causing destruction. Based on vision sensors, many fire detection techniques have evolved with the recent surge of curiosity in deep learning, which exploits the spatial features of individual images. However, fire can take different forms, scales, and combustion materials can produce different colors, making accurate fire detection from an image challenging. Small fires captured from long-distance cameras lack salient features, further complicating detection. This paper proposes a hybrid structure that uses attention-enhanced convolutional neural networks and vision transformers (CNN-ViT) to detect fire. The proposed CNN-ViT first pays spatial attention to every frame and then aggregates temporal contextual information from neighboring frames to improve detection performance. Due to the limited availability of training fire datasets, the study employs deep transfer learning for feature extraction using pre-trained CNN. We used various metrics to examine the efficacy of the proposed approach. The results showed that the CNN-ViT method outperformed previous models based on spatial-temporal characteristics by achieving a relative improvement in accuracy and F1 score. The satisfactory results on images contaminated with different intensities of noise confirm the robustness of the approach.
... YOLO has become one of the most optimal object detection algorithms at present. There are improvement options to choose YOLOv3, combined with the CAM [13], or in the detection output module, the improved K-means algorithm optimizes the prior box [14], or adds a variable convolution module [15]. The stability of model accuracy is greatly affected by the environment. ...
Article
Full-text available
To overcome low efficiency and accuracy of existing forest fire detection algorithms, this paper proposes a network model to enhance the real-time and robustness of detection. This structure is based on the YOLOv5 target detection algorithm and combines the backbone network with The feature extraction module combines the attention module dsCBAM improved by depth-separable convolution, and replaces the loss function CIoU of the original model with a VariFocal loss function that is more suitable for the imbalanced characteristics of positive and negative samples in the forest fire data set. Experiments were conducted on a self-made and public forest fire data set. The accuracy and recall rate of the model can reach 87.1% and 81.6%, which are 7.40% and 3.20% higher than the original model, and the number of images processed per second reaches 64 frames, a growth rate of 8.47%. At the same time, this model was compared horizontally with other improved methods. The accuracy, recall rate and processing speed were all improved in the range of 3% to 10%. The effectiveness of the improved method in this article was verified, and the external perception level of the forest fire scene was deeper.
... Qin Y Y et al. [22] combined classification and target detection models based on the YOLOv3 algorithm and depth-separable convolution for fire detection. Zhang X et al. [41] combined the YOLOv3 network with channel attention to improve the feature extraction ability of the network for flame and smoke, but the channel attention ignores the location information and fails to establish a remote dependency, resulting in partial information loss. Xu R et al. [34] addressed the limitation of feature extraction by a single network model in a complex scene by combining two parallel models, YOLOv5 and EfficientNet [28], to detect fire collaboratively and improve the fire detection ability. ...
Preprint
Full-text available
For fire detection, there are characteristics such as variable sample feature morphology, complex background and dense target, small sample size of dataset and imbalance ofcategories, which lead to the problems of low accuracy and poor real-time performanceof the existing fire detection models. We propose a flame smoke detection model basedon efficient multi-scale feature enhancement, i.e., EA-YOLO. In order to improve theextraction capability of the network model for flame target features, an efficient attentionmechanism is integrated into the backbone network, Multi Channel Attention (MCA), andthe number of parameters of the model is reduced by introducing the RepVB module;at the same time, we design a multi-weighted multidirectional feature neck structure,Multidirectional Feature Pyramid Network (MDFPN), to enhance the model’s flametarget feature information fusion ability; finally, the CIoU loss function is redesigned byintroducing the Slide weighting function to improve the imbalance between difficult andeasy samples. In addition, to address the problem of a small sample size of fire datasets,this paper establishes two fire datasets, Fire-Smoke and Ro-Fire-Smoke, of which thelatter has the model robustness validation function. The experimental results show that themethod of this paper is 6.5% and 7.3% higher compared to the baseline model YOLOv7on the Fire-Smoke and Ro-Fire-Smoke datasets, respectively. The detection speed is 74.6frames per second. It fully demonstrates that the method in this paper has high flamedetection accuracy while considering the real-time nature of the model. The source codeand dataset are located at https://github.com/DIADH/DIADH.YOLO.
... These affect a lot to the performance of forest fire detector. Most research focuses on model optimization such as the improvement of detection recall [34], but few focuses on smoke-and flame-like scenes [13]. Through the experiment, we observe that our Fire-PPYOLOE is able to detect the smoke-and fire-like objects by leveraging large kernel convolution. ...
Article
Full-text available
Forest fire has the characteristics of sudden and destructive, which threatens safety of people’s life and property. Automatic detection and early warning of forest fire in the early stage is very important for protecting forest resources and reducing disaster losses. Unmanned forest fire monitoring is one popular way of forest fire automatic detection. However, the actual forest environment is complex and diverse, and the vision image is affected by various factors easily such as geographical location, seasons, cloudy weather, day and night, etc. In this paper, we propose a novel fire detection method called Fire-PPYOLOE. We design a new backbone and neck structure leveraging large kernel convolution to capture a large arrange area of reception field based on the existing fast and accurate object detection model PP-YOLOE. In addition, our model maintains the high-speed performance of the single-stage detection model and reduces model parameters by using CSPNet significantly. Extensive experiments are conducted to show the effectiveness of Fire-PPYOLOE from the views of detection accuracy and speed. The results show that our Fire-PPYOLOE is able to detect the smoke- and flame-like objects because it can learn features around the object to be detected. It can provide real-time forest fire prevention and early detection.
... The ground-truth labels are sometimes not credible, resulting in low-quality predictions of the bounding boxes. In previous research work, most models focused on model optimization, such as better detection accuracy, lighter weight models, and faster detection speed [34][35][36]. However, the smoke and flame boundary situation that exists in reality was not considered, and the algorithm proposed in this paper takes into account the smoke and flame boundary situation. ...
Article
Full-text available
Fire perception based on machine vision is essential for improving social safety. Object recognition based on deep learning has become the mainstream smoke and flame recognition method. However, the existing anchor-based smoke and flame recognition algorithms are not accurate enough for localization due to the irregular shapes, unclear contours, and large-scale changes in smoke and flames. For this problem, we propose a new anchor-free smoke and flame recognition algorithm, which improves the object detection network in two dimensions. First, we propose a channel attention path aggregation network (CAPAN), which forces the network to focus on the channel features with foreground information. Second, we propose a multi-loss function. The classification loss, the regression loss, the distribution focal loss (DFL), and the loss for the centerness branch are fused to enable the network to learn a more accurate distribution for the locations of the bounding boxes. Our method attains a promising performance compared with the state-of-the-art object detectors; the recognition accuracy improves by 5% for the mAP, 8.3% for the flame AP50, and 2.1% for the smoke AP50 compared with the baseline model. Overall, the algorithm proposed in this paper significantly improves the accuracy of the object detection network in the smoke and flame recognition scenario and can provide real-time fire recognition.
... Our proposed model has significant drawbacks; for instance, when we tested the model in various situations, electric light or the sun in some cases were regarded as fire. To address this issue, we aim to improve the suggested model using additional datasets from other contexts [63][64][65][66][67]. In the custom dataset, we also did not add any classes for smoke. ...
Article
Full-text available
Citation: Norkobil Saydirasulovich, S.; Abdusalomov, A.; Jamil, M.K.; Nasimov, R.; Kozhamzharova, D.; Cho, Y.-I. A YOLOv6-Based Improved Fire Detection Approach for Smart City Environments. Sensors 2023, 23, 3161. https://doi. Abstract: Authorities and policymakers in Korea have recently prioritized improving fire prevention and emergency response. Governments seek to enhance community safety for residents by constructing automated fire detection and identification systems. This study examined the efficacy of YOLOv6, a system for object identification running on an NVIDIA GPU platform, to identify fire-related items. Using metrics such as object identification speed, accuracy research, and time-sensitive real-world applications, we analyzed the influence of YOLOv6 on fire detection and identification efforts in Korea. We conducted trials using a fire dataset comprising 4000 photos collected through Google, YouTube, and other resources to evaluate the viability of YOLOv6 in fire recognition and detection tasks. According to the findings, YOLOv6's object identification performance was 0.98, with a typical recall of 0.96 and a precision of 0.83. The system achieved an MAE of 0.302%. These findings suggest that YOLOv6 is an effective technique for detecting and identifying fire-related items in photos in Korea. Multi-class object recognition using random forests, k-nearest neighbors, support vector, logistic regression, naive Bayes, and XGBoost was performed on the SFSC data to evaluate the system's capacity to identify fire-related objects. The results demonstrate that for fire-related objects, XGBoost achieved the highest object identification accuracy, with values of 0.717 and 0.767. This was followed by random forest, with values of 0.468 and 0.510. Finally, we tested YOLOv6 in a simulated fire evacuation scenario to gauge its practicality in emergencies. The results show that YOLOv6 can accurately identify fire-related items in real time within a response time of 0.66 s. Therefore, YOLOv6 is a viable option for fire detection and recognition in Korea. The XGBoost classifier provides the highest accuracy when attempting to identify objects, achieving remarkable results. Furthermore, the system accurately identifies fire-related objects while they are being detected in real-time. This makes YOLOv6 an effective tool to use in fire detection and identification initiatives.
... We introduce a classical encoder-decoder architecture to explore the multiscale semantic features as shown in Fig Dynamic attention strategy for scale and spatial locations: Attention mechanisms have shown excellent performance in a wide range of computer vision tasks, such as flame detection [28]. We find recent studies only focus on channel or spatial attentions for flame detection [29][30][31][32]. Based on this deficiency, we introduce a novel mechanism named dynamic attention inspired by [33], which separately deploys weights on particular level-wise and spatial-wise dimensions. ...
Article
Full-text available
Recently, flame detection has attracted great attention. However, existing methods have the issues of low detection rates, high false alarm rates, and lack of smoke anti‐interference ability. In this letter, a novel dynamic attention‐based network (DANet) is proposed for autonomous flame detection in various scenarios. To mitigate the disturbance of smoke in images, a dynamic attention strategy is proposed to discover the potential features among scale‐awareness and spatial‐awareness. Then, based on dynamic attention module, a decoupled detection head is presented, which can predict category, regression, and object score independently to boost the performance. A self‐contained challenging flame dataset, which is multi‐scene, multiscale, and multi‐interference informative is constructed to evaluate the proposed model and organize the experiments. Extensive ablation and comparison studies on self‐labelled dataset reveal the effectiveness of the proposed dynamic attention‐based network.
... The existence of these smoky fire targets will adversely affect the accuracy of model detection. In previous research work, most models focus on model optimization, such as better detection accuracy, lighter weight models, and faster detection speed [32][33][34]. Anyway, the smoke-like fire situation that exists in reality is not considered, and the Fire-YOLO model proposed in this paper detects the smoke-like fire situation. ...
Article
Full-text available
For the detection of small targets, fire-like and smoke-like targets in forest fire images, as well as fire detection under different natural lights, an improved Fire-YOLO deep learning algorithm is proposed. The Fire-YOLO detection model expands the feature extraction network from three dimensions, which enhances feature propagation of fire small targets identification, improves network performance, and reduces model parameters. Furthermore, through the promotion of the feature pyramid, the top-performing prediction box is obtained. Fire-YOLO attains excellent results compared to state-of-the-art object detection networks, notably in the detection of small targets of fire and smoke. Overall, the Fire-YOLO detection model can effectively deal with the inspection of small fire targets, as well as fire-like and smoke-like objects. When the input image size is 416 × 416 resolution, the average detection time is 0.04 s per frame, which can provide real-time forest fire detection. Moreover, the algorithm proposed in this paper can also be applied to small target detection under other complicated situations.
Article
Video-based fire detection systems represent an innovative path in fire signalling. Thanks to a suitably designed algorithm, a system of this kind can enable the detection of a flame based on its characteristics such as colour or shape, which were not previously used in classical fire detection systems. Video-based detection systems, due to their early stage of development in the fire protection market, are not yet a certified, fully tested method for early fire detection. This paper focuses on the analysis of possible causes of false alarms occurring in video-based fire detection systems in relation to classical Fire Alarm Systems (FAS). For this purpose, a video-based flame detection algorithm is designed and implemented to further analyse the phenomena occurring in such systems.
Article
The importance of power plant safety is increasing in the era of gradual technological development. When a fire occurs in the power plant, it will cause huge material losses, social unrest, and even casualties. The paper studies the common methods and models of fire warning, and introduces several model recognition techniques based on flames or smoke. Improved an automated power plant identification system based on the vision transformer, and proved the advantages of the technology through comparative analysis.
Article
Full-text available
To address the problem of weakly supervised instance segmentation for electrical equipment using RGB camera only, an automatic annotation of masks of samples (AAMS) method based on thermal image guidance is proposed in this paper. With image-level label supervision only, we exploit foreground segmentation results of thermal images to guide the instance mask extraction of electrical equipment in RGB images through the heterogeneous pixel registration algorithm between RGB-T image pairs. It is realized to automatically annotate instance masks, which greatly improves efficiency and decreases costs. In addition, we further propose a progressively optimized model (POM) for instance segmentation, which first utilizes the fully-connected conditional random field (CRF) and the constrain-to-boundary loss to specify fine-detailed boundaries of each object and to solve the difficulty of segmenting electrical equipment with complicated structures. This model also explores the self-paced learning technology to solve the issue of resolution differences between RGB-T image pairs for improving the generalization ability. By comparison to the other state-of-the-arts, experimental results show that our method can obtain by far the better performance on the electrical equipment dataset.
Article
Full-text available
Fire is an abnormal event which can cause significant damage to lives and property. In this paper, we propose a deep learning-based fire detection method using a video sequence, which imitates the human fire detection process. The proposed method uses Faster Region-based Convolutional Neural Network (R-CNN) to detect the suspected regions of fire (SRoFs) and of non-fire based on their spatial features. Then, the summarized features within the bounding boxes in successive frames are accumulated by Long Short-Term Memory (LSTM) to classify whether there is a fire or not in a short-term period. The decisions for successive short-term periods are then combined in the majority voting for the final decision in a long-term period. In addition, the areas of both flame and smoke are calculated and their temporal changes are reported to interpret the dynamic fire behavior with the final fire decision. Experiments show that the proposed long-term video-based method can successfully improve the fire detection accuracy compared with the still image-based or short-term video-based method by reducing both the false detections and the misdetections.
Article
Full-text available
Due to natural disaster and global warning, one can expect unexpected fire, which causes panic among people and extent to death. To reduce the impact of fire, the authors propose a new method for predicting and rating fire in video through deep-learning models in this work such that rescue team can save lives of people. The proposed method explores a hybrid deep convolutional neural network, which involves motion detection and maximally stable extremal region for detecting and rating fire in video. Further, the authors propose to use a channel-wise attention mechanism of the deep neural network for detecting rating of fire level. Experimental results on a large dataset show the proposed method outperforms the existing methods for detecting and rating fire in video.
Article
Full-text available
Computer vision-based fire detection is one of the crucial tasks in modern surveillance system. In recent years, the convolutional neural network (CNN) has become an active topic because of its high accuracy recognition rate in a wide range of applications. How to reliably and effectively solve the problems of flame detection, however, has still been a challenging problem in practice. In this paper, we proposed a novel flame detection algorithm based on CNN in real time by processing the video data generated by an ordinary camera monitoring a scene. Firstly, to improve the efficiency of recognition, a candidate target area extraction algorithm is proposed for dealing with the suspected flame area. Secondly, the extracted feature maps of candidate areas are classified by the designed deep neural network model based on CNN. Finally, the corresponding alarm signal is obtained by the classification results. The experimental results show that the proposed method can effectively identify fire and achieve higher alarm rate in the homemade database. The proposed method can effectively realize the real-time performance of fire warning in practice.
Article
Full-text available
The recent advances in embedded processing have enabled the vision based systems to detect fire during surveillance using convolutional neural networks (CNNs). However, such methods generally need more computational time and memory, restricting its implementation in surveillance networks. In this research article, we propose a cost-effective fire detection CNN architecture for surveillance videos. The model is inspired from GoogleNet architecture, considering its reasonable computational complexity and suitability for the intended problem compared to other computationally expensive networks such as “AlexNet”. To balance the efficiency and accuracy, the model is fine-tuned considering the nature of the target problem and fire data. Experimental results on benchmark fire datasets reveal the effectiveness of the proposed framework and validate its suitability for fire detection in CCTV surveillance systems compared to state-of-the-art methods.
Article
Full-text available
An unmanned aerial vehicle (UAV) equipped with global positioning systems (GPS) can provide direct georeferenced imagery, mapping an area with high resolution. So far, the major difficulty in wildfire image classification is the lack of unified identification marks, the fire features of color, shape, texture (smoke, flame, or both) and background can vary significantly from one scene to another. Deep learning (e.g., DCNN for Deep Convolutional Neural Network) is very effective in high-level feature learning, however, a substantial amount of training images dataset is obligatory in optimizing its weights value and coefficients. In this work, we proposed a new saliency detection algorithm for fast location and segmentation of core fire area in aerial images. As the proposed method can effectively avoid feature loss caused by direct resizing; it is used in data augmentation and formation of a standard fire image dataset 'UAV_Fire'. A 15-layered self-learning DCNN architecture named 'Fire_Net' is then presented as a self-learning fire feature exactor and classifier. We evaluated different architectures and several key parameters (drop out ratio, batch size, etc.) of the DCNN model regarding its validation accuracy. The proposed architecture outperformed previous methods by achieving an overall accuracy of 98%. Furthermore, 'Fire_Net' guarantied an average processing speed of 41.5 ms per image for real-time wildfire inspection. To demonstrate its practical utility, Fire_Net is tested on 40 sampled images in wildfire news reports and all of them have been accurately identified.
Article
Full-text available
In this paper, Faster R-CNN was used to detect wildland forest fire smoke to avoid the complex manually feature extraction process in traditional video smoke detection methods. Synthetic smoke images are produced by inserting real smoke or simulative smoke into forest background to solve the lack of training data. The models trained by the two kinds of synthetic images respectively are tested in dataset consisting of real fire smoke images. The results show that simulative smoke is the better choice and the model is insensitive to thin smoke. It may be possible to further boost the performance by improving the synthetic process of forest fire smoke images or extending this solution to video sequences.