ArticlePDF Available

Refined Edge Detection With Cascaded and High-Resolution Convolutional Network

Authors:
Rened Edge Detection With Cascaded and
High-Resolution Convolutional Network
Omar Elharroussa, Youssef Hmamouchea, Assia Kamal Idrissia, Btissam El
Khamlichia, Amal El Fallah-Seghrouchnia
aInternational Articial Intelligence Center of Morocco (Ai Movement) - University
Mohammed VI Polytechnique
Abstract
Edge detection is represented as one of the most challenging tasks in com-
puter vision, due to the complexity of detecting the edges or boundaries
in real-world images that contains objects of dierent types and scales like
trees, building as well as various backgrounds. Edge detection is represented
also as a key task for many computer vision applications. Using a set of
backbones as well as attention modules, deep-learning-based methods im-
proved the detection of edges compared with traditional methods like Sobel
or Canny. However, images of complex scenes still represent a challenge for
these methods. Also, the detected edges using the existing approaches suer
from non-rened results with erroneous edges. In this paper, we attempted
to overcome these challenges for rened edge detection using a cascaded and
high-resolution network named (CHRNet). By maintaining the high resolu-
tion of edges during the training process, and conserving the resolution of
the edge image during the network stage, sub-blocks are connected at every
stage with the output of the previous layer. Also, after each layer, we use
Preprint submitted to Nuclear Physics B January 26, 2023
batch normalization layer with an active ane parameter as an erosion op-
eration for the homogeneous region in the image. The proposed method is
evaluated using the most challenging datasets including BSDS500, NYUD,
and Multicue. The obtained results outperform the designed edge detection
networks in terms of performance metrics and quality of output images.The
code is available at: https://github.com/elharroussomar/chrnet/
Keywords: Edge detection, Convolutional neural networks, Deep learning,
Scale-representation, Backbone.
1. Introduction
Extracting the salient edge from natural images represents a challenge
for computer vision applications[1, 2]. Due to the complexity of images, the
objects in them (trees, buildings, cars), and the collisions between the com-
ponents of images, make the separation of the edges a dicult operation
using statistical-based methods [3, 4, 5]. Nowadays, after the introduction of
deep learning techniques as well as the development of the machine’s perfor-
mance with GPUs, this task becomes doable with convincing accuracy and
the possibility of implementing it in real-time applications [7, 8].
Edge detection is the operation of extracting the contour of dierent
objects and automatically ignoring the other details [9, 10]. It’s also the
operation of labeling the boundaries between the homogeneous parts in an
image. This detection can be exploited by other computer vision tasks like
object detection [11], image segmentation [12], and image reconstruction [13].
2
Convs
Convs
Convs
Convs
Fusion
Convs
P
B1 B2 B3
Figure 1: The proposed architecture for edge detection. C onvs represents the list of
convolutional layers. Pis a pooling layer. Birepresents the block of convolutional and
pooling layers that allows deep learning from the extracted features at each stage.
Also, besides RBG images the edge can be detected from infrared images [14].
This can be done with the analysis of low-level and high-level features that
make the processing of the images cover all the components in it. These
features are used also with deep learning models to reach a detection with
high performance. Many methods have been proposed for this purpose, but
the detection of object boundaries conserving some details about the objects
aects the performance of these methods to achieve optimal results [15]. In
addition, the scale variations, the shapes of objects, and the intensities of
image regions aect the quality of detection for all the proposed methods.
In order to overcome the cited challenges, researchers developed dierent
deep learning architectures [16, 17]. For example, some researchers exploited
and adopted deep learning backbones like VGG, ResNet, DenseNet, and
others, for constructing new edge detection models [18]. While some Con-
3
Table 1: Summarization of edge detection methods
Task Method Technique/Backbone Dataset
Statistical-based
Martin et al. [1] Brightness and Color Gradients, Classier Berkeley
Arbelaez et al. [2] Brightness, Color, and Texture Gradients, r BSDS500
Dollar et al. [3] Structured random forests, boosted classier BSDS500, NYUD
Mairal et al. [6] Least-squares reconstruction errors minimization PASCAL VOC
Ren et al. [7] Local sparse coding, SVM BSDS500, NYUD
Lim et al. [8] sketch tokens lter, HOG BSDS500, PASCAL VOC
Romani et al. [9] Variably Scaled Kernels (VSKs) interpolation Simple images
Mittal et al. [10] Simulated triple thresholds Simple images
Wang et al. [14] Spiking neural network Infrared images
Sert et al. [15] Maximum norm entropy (EDA-NMNE) Simple Images
Deep-learning-based
CAFENet [29] Encoder-decoder/ ResNet-34 FSE-1000 and SBD-5
HED [27] Holistically-Nested Network /VGG16 BSDS500, NYUD
LPCB [28] ResNeXt blocks / VGG16 BSDS500, NYUD
RCF [30] Richer convolutional features (RCF)/ VGG16 BSDS500, NYUD, Multicue
BDCN [32] VGG16, Cascade network BSDS500, NYUD, Multicue
REDN [34] DenseNet, Encoder-decoder BSDS500, NYUD
PiDiNet [35] CNN, Dilation and spatial attention module BSDS500, NYUD, Multicue
RHN [36] CNN, Residual VGG-16 BSDS500, NYUD, Multicue
Li et al. [37] ANDD matrices, CESM, CEDM BSDS500
volutional Neural Networks (CNN) with a hierarchical representation are
proposed to take benet from dierent features generated by each block of
layers [19]. These edge detection methods attempted to visualize the out-
puts of dierent layers [20]. While the development of edge predictions in
dierent intermediate layers can be shown, which gives the researchers the
possibility to enhance or change a part of the network to improve the quality
of predicted edges [21]. Also, to enforce the rst layers of the network to
predict the edge from dierent scales, the last layers are chosen to work on
a certain scale and enhance the output of the previous layers.
These methods used all convolutional features that represent multi-scale
representations as used for computer vision tasks, then a combination of each
stage output is performed [22]. While some of these methods used unied net-
works with encoder-decoder architectures [23]. Training these models, which
4
can contain millions of parameters, is costly in terms of time and memory
space, especially when the networks are composed of attention modules. This
also represents a challenge for these methods in the case of using the edge
detector algorithm in real time.
Exploiting dierent intermediate layers outputs as well as minimizing the
number of parameters in the networks, we proposed a new edge detection
network with a rened representation. The proposed architecture consists
of using dierent block outputs and avoiding the loss of certain features
during pooling operations. This is performed by concatenating each block’s
results with the outputs of the previous layer as illustrated in Figure 1. The
advantage of the proposed model is that the network is not composed of
any dilation layers as well as without attention modules. In order to solve
low-quality detection ( non-maximum pixels and the noise around the edge
region), we used batch-normalization with learnable ane parameters which
makes the proposed network able to remove these kinds of noise around the
edge region. After the experiments on many datasets including BSDS500,
NYUD, and Multicue as well as compared with the state-of-the-art methods,
the obtained results using the proposed architecture are more accurate in
terms of quality, precision, and also outperform all methods, especially for
NYUD dataset. In terms of training and testing time, the proposed model
is less computational time and a shown dierence in FPS values compared
with the most performed methods.
The paper sections are organized as follows. The related works are pre-
5
Fusion
Input
Output
(a) Multi-scale representation
Attentions
Or
Conv layers
Fusion
Input
Output
Attentions
Or
Conv layers
Attentions
Or
Conv layers
Attentions
Or
Conv layers
(b) Multi-scale representation with attention modules
Fusion
Input
Output
(c) Our multi-scale representation
Figure 2: The structure of edge detection architectures. First architecture is used by HED
[27]. Second architecture is exploited by RCF [30], BDCN [32], PiDiNet[35], and RHN
[36] methods
sented in section 2. Section 3 describes the proposed edge detection method
presented in section 4. The obtained results and discussion of them are
provided in Section 5. A conclusion is presented in section 6.
2. Related works
Edge detection also named boundary or contour detection still one of
the challenging tasks in computer vision using statistical or deep-learning-
based methods. Statistical-based methods exploiting color, local brightness
features [24], clustering algorithms [25], or local image patches [26] for detect-
6
ing the edge in an image. The statistical-based methods can work on a set of
data while the detection is based on the selected features which makes these
methods accurate for some types of scenarios and less performing for others
[4, 7, 6, 8]. Mainly the low-level features like the pixel intensities and color
gradients [9, 10], the object texture is exploited with learning approaches to
detect the content boundaries in the images [14, 15]. Even so, these meth-
ods have limitations such as low real-time accuracy, high-level information
extraction, and sensitivity to scale and environmental changes.
For deep-learning-based methods that used Convolutional neural net-
works trained on large-scale datasets [29, 30, 31], the possibility to analyze
a large set of scenarios and type of scene becomes possible which makes
such a method to be generic. The proposed method attempted to work on
scale representations to extract the edge at each block of the network. Then
these features are fused to construct the nal edge detection results. For
that, some authors proposed a unied network based on the existing fea-
ture extraction backbone. For example, in [27] VGG-16 network is used for
holistically-Nested edge detection architecture (HED). The same backbone
has been used by Deng et al. [28] for boundaries detection (LPCB). LPCB
method is a unied method (no multi-scale representation) that consists of
connected parallel layers with ResNeXt blocks. Instead of predicting the edge
directly from the input image, the authors in [29] start rst by segmenting
the salient object before extracting the edge from the segmented results using
a deep learning architecture named CAFENet.
7
For example, in [30] the authors proposed an architecture named richer
convolutional features (RCF) that exploited the VGG-16 backbone of 13 lay-
ers divided into 5 stages, which gives the ability to extract the features from
multilevel and multi-scale representations. In the same context, the authors
in [31] proposed a Bi-Directional Cascade Network (BDCN) architecture that
consists of using each block of the CNN networks with a Scale Enhancement
Module (SEM) for generating multi-scale features, then concatenating all
outputs to obtain the nal edge map.
In the same context, the authors in [32] proposed a Cascaded Network
for edge detection named BDCN. The proposed network consists of a set of
blocks separated by pooling layers and used VGG16 as a backbone. Each
block output is taken into consideration for the nal edge results. The scale
variation is handled using a Scale Enhancement Module (SEM) connected
to each convolutional layer in the network. The SEM Module allows multi-
scale representations for the edge detection learning process. Another method
named REDN is proposed in [34]. Unlike the other methods that used uni-
ed networks with a succession of blocks of convolutional and pooling layers,
REDN is an encoder-decoder method for edge detection. The proposed ar-
chitecture exploits DenseNet architecture as the encoder of the model, while
the transposed convolutions are used for the decoder side.
Like in [31, 32], the authors in [35] proposed a deep learning architec-
ture for edge detection named PiDiNet. The proposed method exploits the
feature extraction network as the backbone of 4 blocks separated by pooling
8
layers. The dilation convolutions module and a spatial attention module are
applied to the results of each block (stage). The results of all stages are
used to generate the edge detection map. The proposed method has many
versions including baseline, Tiny and small. In the same context, the au-
thors in [36] proposed an edge detection model named RHN based on an
extended version of VGG-16 named residual VGG-16 for feature extraction.
The authors use each backbone stage, separated by pooling layers, outputs
to be the input of a new block of convolutional and upsampling layers. All
the outputs are concatenated to generate the nal edge map. the authors in
[37] proposed a deep learning method for edge detection, while the images
are prepossessed before being introduced into the model. The method starts
by extracting the features from R, G, and B images to compute anisotropic
directional derivative (ANDD) matrices. the feature results are used for post-
possessing with image decomposition by color edge strength maps (CESMs)
technique and color edge direction maps (CEDMs). CESMs features are used
for the classication stage to generate the edge map. From the experiment
and the comparisons with deep learning methods, the proposed method is
less accurate regarding the renement of the edge regions. in the same con-
text, the authors in [38] proposed a Lightweight Dense Convolutional (LDC)
architecture whit a reduced number of parameters.
With the introduction of attention modules and their performance for
Neural Language Processing (NLP) tasks, computer vision researchers at-
tempted to combine self-attention with CNN-based architecture to improve
9
the performance of image processing tasks [39]. This led to the wide use of
attention instead of convolutions networks for detection or segmentation pur-
poses. In addition, it become the base of creating another technique based on
attention named Transformers which is worked well in NLP [40]. Due to the
successes of transformer networks in Neural Language Processing (NLP), the
researchers attempted to exploit it on images for segmentation and detection
tasks [41]. For that, the authors in [42] proposed a vision-transformer-based
model for edge detection. The proposed method named EDTER consists of
learning from transformer features, then detecting the edge after rening it
to obtain the nal results. Using the two-stage model, EDTER is costly in
terms of complexity and number of parameters compared to the other models
while it costs more than 900.0 GFLOPs in the two stages.
From all the obtained results of the proposed methods, we can observe
that the architectures that are based on multi-scale presentations are the
most accurate, also these methods used the same backbone for features ex-
traction which is VGG-16. For the datasets used for training and evalua-
tion of the proposed methods, the NYUD dataset represents the challenging
dataset compared with the others like BSDS500 and Multicue datasets.
3. Proposed method
3.1. Formulation
Suppose that a sample of training set T denoted by (X, Y ), While X=
{xj;j= 1; ...;|X|} represent the raw of input image, and Y={yj;j=
10
Conv 3×3
Upsampling
Max Pooling Concatenation
Conv 3×3
P
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
P
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
Conv 3×3
𝒀𝟏𝒀𝟐𝒀𝟑
𝑬𝟏
𝑬𝟐
𝑬𝟑
𝑬𝟏𝑬𝟐𝑬𝟑
𝑬𝟏
𝑬𝟐
𝑬𝟑
𝑬𝟏
𝑬𝟐
𝑬𝟑
Figure 3: Flowchart of the proposed edge detection network.
1; ...;|X|};yj {0; 1}is ground-truth of edge map. Also Let E={Ej
i;i=
1; ...;M;j= 1; ...;N}, while M represent the number of blocks in the network,
and N represent the number of output of each block.
The edge of an object can be extracted during the learning process by
removing the meaningless information as well as representing dierent scales.
For that, we select the output of each block also the output of each scale of
the network to get the nal edge map. The edge map Y is a combination of
the output of the S binary image of dierent scales of the network and the
output Eof each block that is used for conserving the resolution of the edge
map. The two components of the proposed methodology can be expressed
as follows.
Y=
S
s=1
(Ys+
N
i=1
(Es
i)) (1)
where Yscontains generated edge corresponding to a scale s, while the
11
Figure 4: Feature maps generated by pixel at each stage of the proposed network.While
the re column represent the feature maps of the third convolutions layer before the rst
block. The column 2, 3, 4n and 5 represents the results of in Y2,E2,Y3,E3respectively.
scale represents the part of the network that is composed of a set of convo-
lutional layers without pooling operation, unlike the existing method.
Our purpose is to generate the edge without losing the features that repre-
sent the edges in the image with high resolution. The resolution conservation
is performed by using only convolutional layers with combining them with
the blocks at each time. Each block is the operation of scaling which is a set
of convolutional and pooling layers that make adjacent convolutional layers
depict image patterns at dierent scales.
During the training of an image X, the generated feature map of s-th
convolutional layer Ns(X)represents the generated edge Ys.Ysis used as
12
input of the Block Bs. After consecutive convolutional and pooling layers
within the Bs, we generated M features Ei,i=1...M that represent the edge
maps of each scale (separated by pooling layer). the output of each scale is
the combination of Ei,i=1...M within the block Bsand the generated Ysedge
map. this operation ensures the resolution conservation that can be aected
by using pooling operation. The loss of information generated edge map
within s-th scale is less than the loss of Yswith the same scale:
Ls(Ys+
N
i=1
(Es
i)) < Ls(Ys)) (2)
Using the concatenation of dierent scale output, the nal estimated edge
map, if we are suing 3 scale, can be dened as follows:
ˆ
Y=Y1+
N
i=1
(E1
i) + Y2+
N
i=1
(E2
i) + Y3+
N
i=1
(E3
i)(3)
The number of scale can very from 3 to ns, and the performance is im-
proved proportionally to the number of scale ns. while the nal results can
be close to the ground-truth when the ns is high:
ˆ
Y=Y1+
N
i=1
(E1
i) + Y2+
N
i=1
(E2
i) + Y3+
N
i=1
(E3
i)(4)
Figure 3 represents the owchart of the proposed architecture. While
gure 4 illustrates all the outputs within each block of the network.
13
3.2. Proposed Architecture
The existing edge detection methods exploited dierent feature extraction
networks as the backbone for their edge detection architectures. VGG-16 is
the most used backbone for edge detection. These methods follow a common
strategy which is the collaboration of dierent stages of the networks to
elaborate the nal edge results. This collaboration can take various forms
as illustrated in Figure 2. The output of the last layers produces low-quality
images as well as some regions are miss detected and contain noises, due to
the use of a set of pooling layers of the unied network. Which can aect
the quality of the edge map.
In order to maintain the high resolution of edges during the training
process, we inspire by [33] as well as from the existing edge detection models
like [30, 32, 34] using the multiple stages with the same scenario. The same
strategy is followed in our proposed models for edge detection. The loss of
information during the pooling operations makes the resolution of the nal
layer output very low and the contextual information can not be eective
to be used during the fusion process. For that, and in order to conserve
the resolution of the edge image during the network stage, we proposed a
network that makes the pooling output at each stage should be connected
and combined with the previous layer’s output. As presented in Figure 3, the
multi-resolution sub-network is connected with the output of the last layer
which is connected always with the output of the convolution layers without
using the pooling operation. The fusion operation is made at each level of
14
Table 2: Comparison of edge detection methods On BSDS500 dataset.
Method ODS OIS AP FPS
HED [27] 0.788 0.808 0.840 78
LPCB[28] 0.808 0.824 - 30
RCF [30] 0.806 0.823 - 30
RCF-MS [30] 0.811 0.830 - 8
REDN [34] 0.808 0.828 - -
REDN (+PASCAL)[34] 0.761 0.785 -
PiDiNet-MS [35] 0.807 0.823 - 92
PiDiNet-small [35] 0.798 0.814 148
RHN [36] 0.817 0.833 - 33
CED [19] 0.794 0.811 - -
CED [19] 0.815 0.833 - -
Li et al. [37] 0.731 0.760 0.605 -
BDCN [32] 0.806 0.826 0.847 -
BDCN-MS[32] 0.828 0.844 0.890 -
EDTER [42] 0.824 0.841 0.880 -
Ours-3 0.787 0.788 0.801 86
Ours-MS-3 0.816 0.845 0.846 86
Ours-MS-5 0.830 0.853 0.870 32
the network while the nal output is a concatenation of each stage of the
network.
The number of blocks can vary from 3 to N. In the experiments, we
used one 3 blocks and 5 blocks. we used in the last two blocks a bacth
normalization layer after each convolution layer to enhance the quality of the
generated edge by removing the non-maximum region between the edge. This
is performed by activating the ane transformation parameter Aff ine =
T rue in the batchnormalization function, which makes the meansweight
and bias used in the learning process.
15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Precision
[F=.80] Human
[F=.83] BDCN
[F=.83] Our-MS-5
[F=.82] CED-MS
[F=.82] PidiNet-MS
[F=.82] Ours-MS-3
[F=.81] RCF-MS
[F=.80] RCF
[F=.80] CED
[F=.79] AMH-Net
[F=.79] Ours
[F=.77] PidiNet
Figure 5: Precision-Recall curves of the proposed models compared with the best state-
of-the-art methods on BSDS500 dataset.
4. Experimental results
This section provides a demonstration of the experimental results pro-
vided by the proposed method on three edge detection public datasets includ-
ing BDSD500, NYUD, and Multicue. The evaluations and the comparisons
have been performed to prove the eectiveness of the proposed architecture
regarding the quality/quantity of the detection as well as the computational
time for our method against the existing ones. The obtained results are com-
16
pared with a set of accurate state-of-the-art methods including HED [27],
LPCB[28], REDN [34], RCF [30], PiDiNet [35], RHN [36],BDCN [32], and
Li et al. [37]. The results are also presented by visualizing some examples
for each dataset.
4.1. Implementation details
In order to train and evaluate the proposed method, the original split of
each dataset has been used. Also, the data augmentation used by the exist-
ing method has been respected. For example, the training set of BSDS500
dataset was augmented with ipping (2×), scaling (3×), and rotation (16×),
which lead to a dataset 96× larger than the original version.
We used for training the proposed model a laptop with 16 GB RAM,
NVidia GPU 1070. The code is implemented with python and we used
PyTorch library. The parameters of the model are the same as PidiNet
method [35], while Adam optimizer is used and the learning rate of 0.005.
The weight decay is set at 0.1. We used the same loss function of PidiNet
method [35] with the parameters λand η. While λis set to 1.1 for both
BSDS500 and Multicue, and 1.3 for NYUD. And the threshold ηis set to 0.3
for both BSDS500 and Multicue. No ηis needed for NYUD since the images
are singly annotated.
The proposed method used parameters that can be adaptively related to
the dataset used in training such as the number of blocks, and the number of
epochs used for training the model. The number of blocks, used for training
17
PidiNet
CED RCF Ours
BDCN
GTImage
Figure 6: Some results of the detected edges on BSDS500 dataset.
the proposed model, is from 3 to 5 like in BDCN [32].
4.2. BSDS500 Dataset evaluation
BSDS500 dataset is composed of 500 images while 200 images are dedi-
cated to the training set, 100 for validation, and 200 for testing. It’s one of
the most used datasets for detecting and extracting the edges in an image.
Like the existing methods including RCF, PidiNet, and BDCN, for training
the proposed method we used the same data augmentation which used the
ipping, scaling, and rotation operations of 2×, 3×, and 16× respectively.
The data also has been mixed with PASCAL-VOC data which is also aug-
mented with the ipping operation and make it larger. In the evaluation
stage, we exploit the Non-maximum suppression (NMS) techniques to thin
and normalize the detected edges. We compared the obtained results with
the most accurate methods in the literature including HED [27],CED [19],
18
RCF [30], PidiNet [35], BDCN [32], and RHN [36]. The comparison is demon-
strated using ODS, OIS, AP, and FPS metrics represented in Tables 2 with
both single-scale and multiscale (MS) (the nal results is the combination
of dierent scale outputs) versions of the proposed method, Precision-Recall
curves presented in Figure 5, and visualization of some obtained edge maps
illustrated in Figure 6.
From Table 2, Figure 5, and Figure 6 we can observe the performance of
the proposed methods compared with the existing methods. While from the
table, the proposed method achieved 0.830 for ODS metrics, 0.853 for OIS
metric and 0.870 for AP metric outperforming CED, RHN, and BDCN-MS
methods that come in the second and third places by a dierence for ODS
metric of 15%, 13%, and 2% respectively. The same observation for the other
methods as well as the output results presented with Precision-Recall curves
in Figure 5, while the proposed method curve is more stable.
For the edge maps illustrated in Figure 6, the obtained results using
the proposed method are more rened and with a high resolution of the
detected edges compared with RCF, BDCN, and PidiNet methods. While
CED results are improved and close to the obtained results. For example,
the second result shows an accurate detection compared with the ground-
truth image even though the image contains regions that can be classied as
edges. The high resolution of the obtained results is the benet of using the
proposed rened network illustrated before with batch normalization.
19
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Precision
[F=.77] Ours
[F=.75] BDCN
[F=.75] RHN
[F=.74] RCF
[F=.73] PidiNet
[F=.72] HED
Figure 7: Precision-Recall curves of the proposed models compared with the best state-
of-the-art methods on NYUD dataset.
4.3. NYUD Dataset evaluation
NYUDv2 is another edge detection dataset that contains two type sets
of 1449 images, one of RGB images and the same image with depth presen-
tation. The same processes of data augmentation have been performed with
three operations including rotating (4×), ipping (2×), and scaling (3×).
The proposed method has been trained on the RGB set, depth set (HHA),
and the combination of the two sets (RGB+HHA). The same metrics ex-
ploited on BSDS500 have been used for evaluating the proposed method as
20
Table 3: Comparison of edge detection methods On NYUD dataset
Method RGB HHA RGB-HHA FPS
ODS OIS ODS OIS ODS OIS
HED [27] 0.720 0.734 0.682 0.695 0.746 0.761 62
LPCB [28] (2018) 0.739 0.754 0.707 0.719 0.762 0.778 -
RCF [30] (2019) 0.743 0.757 0.703 0.717 - - 20
BDCN [32] (2019) 0.748 0.763 0.707 0.719 0.765 0.781
PiDiNet [35] (2021) 0.733 0.747 0.715 0.728 0.756 0.773 62
RHN [36] (2021) 0.751 0.762 0.711 0.721 0.772 0.789 24
Ours-3 0.729 0.745 0.718 0.731 0.750 0.774 51
Ours-MS-3 0.756 0.769 0.743 0.755 0.776 0.778 56
Ours-MS-5 0.774 0.795 0.771 0.793 0.791 0.805 34
Table 4: Comparison of edge detection methods On Multicue dataset
Method Boundary Edge FPS
ODS OIS ODS OIS
RCF [30] - - 0.857 0.862 15
BDCN [32] 0.836 0.846 0.891 0.898 9
PiDiNet [35] 0.818 0.830 0.855 0.860 17
RHN [36] 0.841 0.856 0.896 0.905 -
Ours-Ms-5 0.859 0.863 0.907 0.922 23
well as the state-of-the-art methods such as ODS and OIS.
For evaluating the proposed method, we used a maximum tolerance of
0.011 instead of 0.0075 used in BSDS500 dataset, because the size of NYUD
images is larger than BSDS500 dataset ones. We Also performed the training,
testing, and evaluation of the proposed model on the three parts of the
dataset including, the RGB set, HHA set, and RGB+HHA set. Table 3
represents the obtained results using single and multi-scale (MS) models
compared with state-of-the-art methods for the three parts. While Figure 7
illustrates the Precision-Recall curves of these methods on RGB set. From
21
Figure 8: Some results of the detected edges from NYUD dataset.
Table 3 we can see that the proposed method outperforms the other methods
for the two metrics including ODS and OIS. While the proposed method with
MS-3 and MS-5 achieved 0.756 and 0.774 for the ODS metric on the RGB set
and outperform BDCN with 2.8%, PidiNet with 4.3%, and RHN with 2.5%.
22
This observation is true also on the HHA set and RGB+HHA set. Figure 7
proves the obtained results in Table 3, while we can see that the Precision-
Recall curve generated by the proposed method is more stable compared to
the state-of-the-art methods.
In addition to the quantitative results represented in Table 3, Figure 8
illustrates some obtained results using the proposed method on the NYUD
dataset. From the visualized results we can see the quality of the detected
edges as well as the ability of the proposed method to detect edge regions
without ’ghosts’ that represent the unsuspected edges in some regions that
do not contain any edge.
4.4. Multicue Dataset evaluation
Another dataset for edge detection named Multicue contains 100 real-
world images of natural scenes and is represented as a challenging edge de-
tection dataset. Multicue is labeled with edges and boundaries. The dataset
has been augmented with ipping (2×), scaling (3×), and rotation (16×)
operations. The proposed method has been evaluated using ODS and OIS
metrics. Table 4 represents the obtained results using each method for edge
and boundary labels. From the table, we can see that the proposed method
and RHN method reached the best ODS and OIS values for edge and bound-
ary representations. while the proposed method is better than RHN by 1%
for ODS and OIS on boundary label and 1% for ODS and 2% for OIS on
edge label.
23
5. Conclusion
This paper proposes a rened edge detection Method for detecting the
contour boundaries of image content. The proposed method consists of de-
tecting edges beneting from the multi-scale representation of the network as
well as conserving the high resolution of the output map. Unlike the existing
methods, this is ensured by using the interconnection between consecutive
parts of the network, exploiting the rened batch normalization layer, and
involving the output of the rst layers in the nal results using fusion oper-
ation. After that, we evaluated the proposed method on three competitive
datasets. The obtained results are compared with the best edge detection
methods. The obtained results and the achieved promising performance rates
in the dierent scenarios. Using the proposed method with 3 blocks (Ours-3
and Ours-MS-3), 86 frame per second (FPS) can be processed and with the
obtained performance accuracy the use of the proposed method is suitable
for real-time it useful for semantic edge segmentation as a perspective for the
next work plan. Since with a basic camera, we can record at 60 FPS in Full
HD quality.
References
[1] S. Guiming, S. Jidong, Multi-scale harris corner detection algorithm
based on Canny edge-detection, in: 2018 IEEE International Confer-
ence on Computer and Communication Engineering Technology, 2018,
pp. 305–309.
24
[2] Hallman, S., and Fowlkes, C. C. (2015). Oriented edge forests for bound-
ary detection. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 1732-1740).
[3] Lopez-Molina, C., Galar, M., Bustince, H., and De Baets, B. (2014). On
the impact of anisotropic diusion on edge detection. Pattern Recogni-
tion, 47(1), 270-281.
[4] S. Zheng, Z. Tu, and A. Yuille. Detecting object boundaries using low-
,mid-, and high-level information. In CVPR, 2007.
[5] P.A. Flores Vidal, G. Villarino, D. Gómez, J. Montero, A new edge de-
tection method based on global evaluation using supervised classication
algorithms, Int. J. Comput. Intell. Syst. 12 (2019) 367–378.
[6] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discrim-
inative sparse image models for class-specic edge detection and image
interpretation. In ECCV, 2008
[7] X. Ren and B. Liefeng. Discriminatively trained sparse code gradients for
contour detection. In NIPS, 2012.
[8] J. Lim, C. L. Zitnick, and P. Dollar. Sketch tokens: A learned mid-level
representation for contour and object detection. In CVPR, 2013.
[9] Romani, L., Rossini, M., & Schenone, D. (2019). Edge detection meth-
ods based on RBF interpolation. Journal of Computational and Applied
Mathematics, 349, 532-547.
25
[10] Mittal, M., Verma, A., Kaur, I., Kaur, B., Sharma, M., Goyal, L. M.,
... & Kim, T. H. (2019). An ecient edge detection approach to provide
better edge connectivity for image analysis. IEEE access, 7, 33240-33255.
[11] Ji, G. P., Zhu, L., Zhuge, M., & Fu, K. (2022). Fast camouaged ob-
ject detection via edge-based reversible re-calibration network. Pattern
Recognition, 123, 108414.
[12] Xu, X., Chen, J., Zhang, H., Han, G. (2022). SA-DPNet: Structure-
aware dual pyramid network for salient object detection. Pattern Recog-
nition, 127, 108624.
[13] Lin, J., Cai, Y., Hu, X., Wang, H., Yuan, X., Zhang, Y., ... Van Gool,
L. (2022). Coarse-to-ne sparse transformer for hyperspectral image re-
construction. arXiv preprint arXiv:2203.04845.
[14] Wang, B., Chen, L. L., & Zhang, Z. Y. (2019). A novel method on the
edge detection of infrared image. Optik, 180, 610-614.
[15] Eser, S. E. R. T., & Derya, A. V. C. I. (2019). A new edge detection
approach via neutrosophy based on maximum norm entropy. Expert Sys-
tems with Applications, 115, 499-511.
[16] Le, M., & Kayal, S. (2021, July). Revisiting Edge Detection in Con-
volutional Neural Networks. In 2021 International Joint Conference on
Neural Networks (IJCNN) (pp. 1-9). IEEE.
26
[17] Orhei, C., Bogdan, V., Bonchis, C., & Vasiu, R. (2021). Dilated Filters
for Edge-Detection Algorithms. Applied Sciences, 11(22), 10716.
[18] Elharrouss, O., Akbari, Y., Almaadeed, N., & Al-Maadeed, S. (2022).
Backbones-Review: Feature Extraction Networks for Deep Learn-
ing and Deep Reinforcement Learning Approaches. arXiv preprint
arXiv:2206.08016.
[19] Y. Wang, X. Zhao, and K. Huang. Deep crisp boundaries. In CVPR,
2017.
[20] Jing, J., Liu, S., Wang, G., Zhang, W., & Sun, C. (2022). Recent ad-
vances on image edge detection: A comprehensive review. Neurocomput-
ing.
[21] Bertasius, G., Shi, J., & Torresani, L. (2015). Deepedge: A multi-scale
bifurcated deep network for top-down contour detection. In Proceedings
of the IEEE conference on computer vision and pattern recognition (pp.
4380-4389).
[22] Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., ... & Sun, J.
(2020, August). Learning delicate local representations for multi-person
pose estimation. In European Conference on Computer Vision (pp. 455-
472). Springer, Cham.
[23] Deng, R., & Liu, S. (2020, October). Deep structural contour detection.
27
In Proceedings of the 28th ACM international conference on multimedia
(pp. 304-312).
[24] D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural
image boundaries using local brightness, color, and texture cues,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 530–549, 2004.
[25] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection
and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 33, no. 5, pp. 898–916, 2011.
[26] P. Dollar and C. L. Zitnick, “Fast edge detection using structured
´forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, pp.
1558–1570, 2015.
[27] Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Pro-
ceedings of the IEEE international conference on computer vision (pp.
1395-1403).
[28] Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, and Xinru
Liu. Learning to predict crisp boundaries. In ECCV, pages 562–578, 2018.
[29] Park, Y. H., Seo, J., & Moon, J. (2020). Cafenet: class-agnostic few-shot
edge detection network. arXiv preprint arXiv:2003.08235.
[30] Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. Richer Convo-
lutional Features for Edge Detection, in IEEE Transactions on Pattern
28
Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1939-1946, 1 Aug.
2019, doi: 10.1109/TPAMI.2018.2878849.
[31] Wang, L., Shen, Y., Liu, H., & Guo, Z. (2019). An accurate and ecient
multi-category edge detection method. Cognitive Systems Research, 58,
160-172.
[32] He, Jianzhong, et al. ”Bi-directional cascade network for perceptual edge
detection. Proceedings of the IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition. 2019.
[33] Sun, K., Xiao, B., Liu, D., Wang, J. (2019). Deep high-resolution rep-
resentation learning for human pose estimation. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition (pp.
5693-5703).
[34] Le, T., & Duan, Y. (2020). REDN: a recursive encoder-decoder network
for edge detection. IEEE Access, 8, 90153-90164.
[35] Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., ... & Liu, L. (2021).
Pixel dierence networks for ecient edge detection. In Proceedings of
the IEEE/CVF International Conference on Computer Vision (pp. 5117-
5127).
[36] Al-Amaren, A., Ahmad, M. O., & Swamy, M. N. S. (2021). RHN: A
Residual Holistic Neural Network for Edge Detection. IEEE Access, 9,
74646-74658.
29
[37] Li, O., & Shui, P. L. (2021). Color edge detection by learning classi-
cation network with anisotropic directional derivative matrices. Pattern
Recognition, 118, 108004.
[38] Soria, X., Pomboza-Junez, G., Sappa, A. D. (2022). LDC: Lightweight
Dense CNN for Edge Detection. IEEE Access, 10, 68281-68290.
[39] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in
neural information processing systems, 30.
[40] Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., &
Shlens, J. (2019). Stand-alone self-attention in vision models. Advances
in Neural Information Processing Systems, 32.
[41] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,
Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16
words: Transformers for image recognition at scale. arXiv preprint
arXiv:2010.11929.
[42] Pu, M., Huang, Y., Liu, Y., Guan, Q., & Ling, H. (2022). EDTER: Edge
Detection with Transformer. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition (pp. 1402-1412).
30
... BSDS500: Firstly, we compare our method with some top-performing algorithms on BSDS500. We select a few recent state-of-the-art edge detectors which can be divided into two categories: the first category is methods without deep learning, which includes Canny [37], gPb-UCM [27] and SE [39]; the second category is approaches using deep learning technique, which including DeepContour [42], DeepEdge [41], HED [23], RCF [24], BDCN [45], CED [26], LPCB [25], DRC [54], PiDiNet [48], EDTER [58] and CHRNet [66]. We additionally reference the studies of other researchers who utilize extra training data sourced from the PASCAL VOC Context dataset [67] and adopt multi-scale testing. ...
... Secondly, we select the NYUD-V2 dataset to conduct another set of comparison experiments. We adopt some methods as before, which consists of algorithms without using deep learning such as gPb-UCM [27], OEF [44], gPb+NG [68], SE [39] and SE+NG+ [69], and recent top edge detectors based on deep learning such as HED [23], RCF [24], BDCN [45], DRC [54], and CHRNet [66]. ...
Preprint
Edge detection is a fundamental task in computer vision and it has made great progress under the development of deep convolutional neural networks (DCNNs), some of them have achieved a beyond human-level performance. However, recent top-performing edge detection methods tend to generate thick and blurred edge lines. In this work, we propose an effective method to solve this problem. Our approach consists of a lightweight pre-trained backbone, multi-scale contextual enhancement module aggregating gradient information (MCGI), boundary correction module (BCM), and boundary refinement module (BRM). In addition to this, we construct a novel hybrid loss function based on the Tversky index for solving the issue of imbalanced pixel distribution. We test our method on three standard benchmarks and the experiment results illustrate that our method improves the visual effect of edge maps and achieves a top performance among several state-of-the-art methods on the BSDS500 dataset (ODS F-score in standard evaluation is 0.829, in crispness evaluation is 0.720), NYUD-V2 dataset (ODS F-score in standard evaluation is 0.768, in crispness evaluation is \textbf{0.546}), and BIPED dataset (ODS F-score in standard evaluation is 0.903).
... Edge refinement is heavily dependent on the detailed features extracted within the network (Elharrouss et al., 2023).These detailed features typically encompass visually salient attributes, including edges, textures, and corners. While these features are often found in the low-level feature maps of the network, the challenge lies in the fact that these maps contain an abundance of detailed information and substantial noise. ...
Article
Full-text available
Due to the complexity of electrical components, traditional edge detection methods cannot always accurately extract key edge features of them. Therefore, this study constructs a dataset of complex electrical components and proposes a Step-by-Level Multi-Scale Extraction, Fusion, and Refinement Network (SMFRNet) that is based on the salient object detection algorithm. As detailed features includes a wealth of texture and shape characteristics that are related to edges, so the Hierarchical Deep Aggregation U-block (HDAU) is incorporated in the encoder as a means of capturing more details through hierarchical aggregation. Meanwhile, the proposed Multi-Scale Pyramid Convolutional Fusion (MPCF) and Fusion Attention Structure (FAS) achieve step-by-level feature refinement to obtain finer edges. In order to address the issues of imbalanced pixel categories and the difficulty in separating edge pixels, a hybrid loss function is also constructed. The experimental results indicate that this method outperforms nine state-of-the-art algorithms, enabling the extraction of high-precision key edge features. It provides a reliable method for key edge extraction in complex electrical components and provides important technical support for automated components measurement.
... In recent years, there has been remarkable progress in computer vision and image recognition, largely driven by advances in deep learning techniques. These techniques have also been extensively applied to image edge detection [24][25][26]. Existing deep learning-based edge detection methods can be categorized based on the availability of labeled features in datasets. This categorization results in supervised learning-based methods, which rely on labeled features, and unsupervised learning-based methods (or selfsupervised learning-based methods), which operate without labeled features. ...
Article
Full-text available
Medical image registration is a crucial step in computer-assisted medical diagnosis, and has seen significant progress with the adoption of deep learning methods like convolutional neural networks (CNN). Creating a deep learning network for image registration is complex because humans can’t easily prepare or supervise the training data unless it’s very basic. This article presents an innovative approach to unsupervised deep learning-based multilevel image registration approach. We propose to develop a CNN to detect the geometric features, such as edges and thin structures, from images using a loss function derived from the Blake-Zisserman energy. This method enables the detection of discontinuities at different scales without relying on labeled data. Subsequently, we use this geometric information extracted from the input images, to define a second loss function and to perform our multimodal image registration process. Furthermore, we introduce a novel deep neural network architecture for multilevel image registration, offering enhanced precision and efficiency compared to traditional methods. Numerical simulations are employed to demonstrate the accuracy and relevance of our approach. We perform some numerical simulations to show the accuracy and the relevance of our approach for multimodal registration and its multilevel implementation.
... The cropping helps remove most of the background and bring the hand to the center. In the next step, the Canny edge detection algorithm [28] is used. It not only serves as the edge detection technique but also drastically reduces the amount of data to be processed. ...
Article
Full-text available
Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.
... To obtain diverse convolutional features, BDCN [13] uses a larger model with a bidirectional cascade structure to guide the training of each layer. Most recently, considering that images in benchmark datasets have multiple annotations, researchers have begun to pay more attention to the rationality of labels and have proposed alternative datasets [14] or refinement methods to produce crisp boundaries [2,44]. In contrast, BetaNet [45] and UAED [15] employ model uncertainty strategies based on controversial annotations by mapping a beta and a Gaussian distribution, respectively. ...
Article
Full-text available
Although deep-learning methods can achieve human-level performance in boundary detection, their improvements mostly rely on larger models and specific datasets, leading to significant computational power consumption. As a fundamental low-level vision task, a single model with fewer parameters to achieve cross-dataset boundary detection merits further investigation. In this study, a lightweight universal boundary detection method was developed based on convolution and a transformer. The network is called a “transformer with difference convolutional network” (TDCN), which implies the introduction of a difference convolutional network rather than a pure transformer. The TDCN structure consists of three parts: convolution, transformer, and head function. First, a convolution network fused with edge operators is used to extract multiscale difference features. These pixel difference features are then fed to the hierarchical transformer as tokens. Considering the intrinsic characteristics of the boundary detection task, a new boundary-aware self-attention structure was designed in the transformer to provide inductive bias. By incorporating the proposed attention loss function, it introduces the direction of the boundary as strongly supervised information to improve the detection ability of the model. Finally, several head functions with multiscale feature inputs were trained using a bidirectional additive strategy. In the experiments, the proposed method achieved competitive performance on multiple public datasets with fewer model parameters. A single model was obtained to realize universal prediction even for different datasets without retraining, demonstrating the effectiveness of the method. The code is available at https://github.com/neulmc/TDCN .
... The HED technique also has the capacity to autonomously learn and may be successfully applied to handle difficult ambiguities in edge detection. Still, edge refinement is needed, as discussed by Elharrouss et al. [23], where a cascaded convolutional neural network (CNN) is used for the refinement of edges. The summary of the discussed methods is presented in Table 1. ...
... The feature maps generated at each block were fed to a separate upsampling network to create intermediate SO edge maps. Elharrouss et al. [27] used refined batch normalization with learnable affine parameters to make the intermediate SOs less noisy around the edges. At the end of the network, these features were fused to generate a better edge map. ...
Article
Full-text available
Deep edge detection is challenging, especially with the existing methods, like HED (holistic edge detection). These methods combine multiple feature side outputs (SOs) to create the final edge map, but they neglect diverse edge importance within one output. This creates a problem: to include desired edges, unwanted noise must also be accepted. As a result, the output often has increased noise or thick edges, ignoring important boundaries. To address this, we propose a new approach called the normalized Hadamard-product (NHP) operation-based deep network for edge detection. By multiplying the side outputs from the backbone network, the Hadamard-product operation encourages agreement among features across different scales while suppressing disagreed weak signals. This method produces additional Mutually Agreed Salient Edge (MASE) maps to enrich the hierarchical level of side outputs without adding complexity. Our experiments demonstrate that the NHP operation significantly improves performance, e.g., an ODS score reaching 0.818 on BSDS500, outperforming human performance (0.803), achieving state-of-the-art results in deep edge detection.
Article
Full-text available
This paper presents a Lightweight Dense Convolutional (LDC) neural network for edge detection. The proposed model is an adaptation of two state-of-the-art approaches, but it requires less than 4% of parameters in comparison with these approaches. The proposed architecture generates thin edge maps and reaches the highest score (i.e., ODS) when compared with lightweight models (models with less than 1 million parameters), and reaches a similar performance when compare with heavy architectures (models with about 35 million parameters). Both quantitative and qualitative results and comparisons with state-of-the-art models, using different edge detection datasets, are provided. The proposed LDC does not use pre-trained weights and requires straightforward hyper-parameter settings. The source code is released at https://github.com/xavysp/LDC .
Article
Full-text available
Edge detection is one of the most important and fundamental problems in the field of computer vision and image processing. Edge contours extracted from images are widely used as critical cues for various image understanding tasks such as image segmentation, object detection, image retrieval, and corner detection. The purpose of this paper is to review the latest developments on image edge detection. Firstly, the definition and properties of edges are introduced. Secondly, the existing edge detection methods are classified and introduced in detail. Thirdly, the existing widely used datasets and evaluation criteria for edge detection methods are summarized. Finally, future research directions for edge detection are elaborated.
Conference Paper
Full-text available
Article
Full-text available
Edges are a basic and fundamental feature in image processing that is used directly or indirectly in huge number of applications. Inspired by the expansion of image resolution and processing power, dilated-convolution techniques appeared. Dilated convolutions have impressive results in machine learning, so naturally we discuss the idea of dilating the standard filters from several edge-detection algorithms. In this work, we investigated the research hypothesis that use dilated filters, rather than the extended or classical ones, and obtained better edge map results. To demonstrate this hypothesis, we compared the results of the edge-detection algorithms using the proposed dilation filters with original filters or custom variants. Experimental results confirm our statement that the dilation of filters have a positive impact for edge-detection algorithms from simple to rather complex algorithms.
Chapter
Many learning-based algorithms have been developed to solve the inverse problem of coded aperture snapshot spectral imaging (CASSI). However, CNN-based methods show limitations in capturing long-range dependencies. Previous Transformer-based methods densely sample tokens, some of which are uninformative, and calculate multi-head self-attention (MSA) between some tokens that are unrelated in content. In this paper, we propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction. In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing. Comprehensive experiments show that our CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs. https://github.com/caiyuanhao1998/MSTKeywordsCompressive imagingTransformerImage restoration
Article
Salient object detection aims at highlighting the most visually distinctive objects in the scene. Previous deep learning based works mainly focus on designing different integration strategies of multi-level features to improve the quality of prediction. However, due to the negligence of spatial structure coherence in predicted saliency maps, they fail to produce satisfactory results in complex scenarios. In this work, we present a structure-aware dual pyramid network (SA-DPNet) for salient object detection. By explicitly formulating spatial location information and spatial covariance features into the self-attention mechanism, a structure-aware spatial non-local block is proposed in SA-DPNet to learn the spatial-sensitive global context. With the proposed edge loss and adversarial loss, the edge structure context and patch-based global structure context are introduced to refine the structural coherence of the predicted results. Comprehensive experimental results on six RGB saliency benchmark datasets and three RGB-D saliency benchmark datasets demonstrate the superiority of proposed SA-DPNet over other state-of-the-art methods, both quantitatively and visually.
Article
Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) to their surroundings, and recently has attracted growing research interest. As camouflaged objects often present very ambiguous boundaries, how to determine object locations as well as their weak boundaries is challenging and also the key to this task. Inspired by the biological visual perception process when a human observer discovers camouflaged objects, this paper proposes a novel edge-based reversible re-calibration network called ERRNet. Our model is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-calibration Unit (RRU), which aim to model the visual perception behaviour and achieve effective edge prior and cross-comparison between potential camouflaged regions and background. More importantly, RRU incorporates diverse priors with more comprehensive information comparing to existing COD models. Experimental results show that ERRNet outperforms existing cutting-edge baselines on three COD datasets and five medical image segmentation datasets. Especially, compared with the existing top-1 model SINet, ERRNet significantly improves the performance by ∼6% (mean E-measure) with notably high speed (79.3 FPS), showing that ERRNet could be a general and robust solution for the COD task.
Article
In this paper, a neural network-based color edge detector is constructed by learning a classifier using anisotropic directional derivative (ANDD) matrices of a color image as input. The training stage on a color edge dataset with ground truth (GT) edges includes calculation of ANDD matrices, generation of feature matrices, and training a classifier. For each training image, a set of ANDD matrices are calculated from the ANDDs with different parameter setups for training and from which a set of the color edge strength maps (CESMs) are extracted by the singular vector decomposition. The CESMs and the GTs on edges of the image are combined into a feature matrix for training. Using the feature matrices of all the training images as input, a classification neural network is trained and it outputs the probability of a pixel to be an edge pixel. In the detection stage, for a color image, its ANDD matrices, CESMs, and the color edge direction maps (CEDMs) are first computed and then the CESMs are input into the classification neural network to obtain the edge probability map (EPM) of the image. Finally, the non-maximum suppression and hysteresis thresholding are applied to the EPM and CEDMs to generate the binary edge map. The proposed detector attains better performance than the existing gradient-based detectors and is competitive with learning-based detectors on three commonly-used color image datasets for edge and contour detection.