ArticlePDF Available

Refined Edge Detection With Cascaded and High-Resolution Convolutional Network

June 2023
Pattern Recognition 138(1):109361

June 2023
138(1):109361

DOI:10.1016/j.patcog.2023.109361

Authors:

The structure of edge detection architectures. First architecture is used by HED [27]. Second architecture is exploited by RCF [30], BDCN [32], PiDiNet[35], and RHN [36] methods

…

Flowchart of the proposed edge detection network.

…

Feature maps generated by pixel at each stage of the proposed network.While the fire column represent the feature maps of the third convolutions layer before the first block. The column 2, 3, 4n and 5 represents the results of in Y 2 , E 2 , Y 3 , E 3 respectively.

…

Precision-Recall curves of the proposed models compared with the best stateof-the-art methods on BSDS500 dataset.

…

Some results of the detected edges on BSDS500 dataset.

…

Figures - uploaded by Omar Elharrouss

Content may be subject to copyright.

Content uploaded by Omar Elharrouss

Content may be subject to copyright.

Rened Edge Detection With Cascaded and

High-Resolution Convolutional Network

Omar Elharroussa, Youssef Hmamouchea, Assia Kamal Idrissia, Btissam El

Khamlichia, Amal El Fallah-Seghrouchnia

aInternational Articial Intelligence Center of Morocco (Ai Movement) - University

Mohammed VI Polytechnique

Abstract

Edge detection is represented as one of the most challenging tasks in com-

puter vision, due to the complexity of detecting the edges or boundaries

in real-world images that contains objects of dierent types and scales like

trees, building as well as various backgrounds. Edge detection is represented

also as a key task for many computer vision applications. Using a set of

backbones as well as attention modules, deep-learning-based methods im-

proved the detection of edges compared with traditional methods like Sobel

or Canny. However, images of complex scenes still represent a challenge for

these methods. Also, the detected edges using the existing approaches suer

from non-rened results with erroneous edges. In this paper, we attempted

to overcome these challenges for rened edge detection using a cascaded and

high-resolution network named (CHRNet). By maintaining the high resolu-

tion of edges during the training process, and conserving the resolution of

the edge image during the network stage, sub-blocks are connected at every

stage with the output of the previous layer. Also, after each layer, we use

Preprint submitted to Nuclear Physics B January 26, 2023

batch normalization layer with an active ane parameter as an erosion op-

eration for the homogeneous region in the image. The proposed method is

evaluated using the most challenging datasets including BSDS500, NYUD,

and Multicue. The obtained results outperform the designed edge detection

networks in terms of performance metrics and quality of output images.The

code is available at: https://github.com/elharroussomar/chrnet/

Keywords: Edge detection, Convolutional neural networks, Deep learning,

Scale-representation, Backbone.

1. Introduction

Extracting the salient edge from natural images represents a challenge

for computer vision applications[1, 2]. Due to the complexity of images, the

objects in them (trees, buildings, cars), and the collisions between the com-

ponents of images, make the separation of the edges a dicult operation

using statistical-based methods [3, 4, 5]. Nowadays, after the introduction of

deep learning techniques as well as the development of the machine’s perfor-

mance with GPUs, this task becomes doable with convincing accuracy and

the possibility of implementing it in real-time applications [7, 8].

Edge detection is the operation of extracting the contour of dierent

objects and automatically ignoring the other details [9, 10]. It’s also the

operation of labeling the boundaries between the homogeneous parts in an

image. This detection can be exploited by other computer vision tasks like

object detection [11], image segmentation [12], and image reconstruction [13].

Convs

Fusion

Convs

B1 B2 B3

Figure 1: The proposed architecture for edge detection. C onvs represents the list of

convolutional layers. Pis a pooling layer. Birepresents the block of convolutional and

pooling layers that allows deep learning from the extracted features at each stage.

Also, besides RBG images the edge can be detected from infrared images [14].

This can be done with the analysis of low-level and high-level features that

make the processing of the images cover all the components in it. These

features are used also with deep learning models to reach a detection with

high performance. Many methods have been proposed for this purpose, but

the detection of object boundaries conserving some details about the objects

aects the performance of these methods to achieve optimal results [15]. In

addition, the scale variations, the shapes of objects, and the intensities of

image regions aect the quality of detection for all the proposed methods.

In order to overcome the cited challenges, researchers developed dierent

deep learning architectures [16, 17]. For example, some researchers exploited

and adopted deep learning backbones like VGG, ResNet, DenseNet, and

others, for constructing new edge detection models [18]. While some Con-

Table 1: Summarization of edge detection methods

Task Method Technique/Backbone Dataset

Statistical-based

Martin et al. [1] Brightness and Color Gradients, Classier Berkeley

Arbelaez et al. [2] Brightness, Color, and Texture Gradients, r BSDS500

Dollar et al. [3] Structured random forests, boosted classier BSDS500, NYUD

Mairal et al. [6] Least-squares reconstruction errors minimization PASCAL VOC

Ren et al. [7] Local sparse coding, SVM BSDS500, NYUD

Lim et al. [8] sketch tokens lter, HOG BSDS500, PASCAL VOC

Romani et al. [9] Variably Scaled Kernels (VSKs) interpolation Simple images

Mittal et al. [10] Simulated triple thresholds Simple images

Wang et al. [14] Spiking neural network Infrared images

Sert et al. [15] Maximum norm entropy (EDA-NMNE) Simple Images

Deep-learning-based

CAFENet [29] Encoder-decoder/ ResNet-34 FSE-1000 and SBD-5

HED [27] Holistically-Nested Network /VGG16 BSDS500, NYUD

LPCB [28] ResNeXt blocks / VGG16 BSDS500, NYUD

RCF [30] Richer convolutional features (RCF)/ VGG16 BSDS500, NYUD, Multicue

BDCN [32] VGG16, Cascade network BSDS500, NYUD, Multicue

REDN [34] DenseNet, Encoder-decoder BSDS500, NYUD

PiDiNet [35] CNN, Dilation and spatial attention module BSDS500, NYUD, Multicue

RHN [36] CNN, Residual VGG-16 BSDS500, NYUD, Multicue

Li et al. [37] ANDD matrices, CESM, CEDM BSDS500

volutional Neural Networks (CNN) with a hierarchical representation are

proposed to take benet from dierent features generated by each block of

layers [19]. These edge detection methods attempted to visualize the out-

puts of dierent layers [20]. While the development of edge predictions in

dierent intermediate layers can be shown, which gives the researchers the

possibility to enhance or change a part of the network to improve the quality

of predicted edges [21]. Also, to enforce the rst layers of the network to

predict the edge from dierent scales, the last layers are chosen to work on

a certain scale and enhance the output of the previous layers.

These methods used all convolutional features that represent multi-scale

representations as used for computer vision tasks, then a combination of each

stage output is performed [22]. While some of these methods used unied net-

works with encoder-decoder architectures [23]. Training these models, which

can contain millions of parameters, is costly in terms of time and memory

space, especially when the networks are composed of attention modules. This

also represents a challenge for these methods in the case of using the edge

detector algorithm in real time.

Exploiting dierent intermediate layers outputs as well as minimizing the

number of parameters in the networks, we proposed a new edge detection

network with a rened representation. The proposed architecture consists

of using dierent block outputs and avoiding the loss of certain features

during pooling operations. This is performed by concatenating each block’s

results with the outputs of the previous layer as illustrated in Figure 1. The

advantage of the proposed model is that the network is not composed of

any dilation layers as well as without attention modules. In order to solve

low-quality detection ( non-maximum pixels and the noise around the edge

region), we used batch-normalization with learnable ane parameters which

makes the proposed network able to remove these kinds of noise around the

edge region. After the experiments on many datasets including BSDS500,

NYUD, and Multicue as well as compared with the state-of-the-art methods,

the obtained results using the proposed architecture are more accurate in

terms of quality, precision, and also outperform all methods, especially for

NYUD dataset. In terms of training and testing time, the proposed model

is less computational time and a shown dierence in FPS values compared

with the most performed methods.

The paper sections are organized as follows. The related works are pre-

Fusion

Input

Output

(a) Multi-scale representation

Attentions

Conv layers

Fusion

Input

Output

Attentions

Conv layers

Attentions

Conv layers

Attentions

Conv layers

(b) Multi-scale representation with attention modules

Fusion

Input

Output

Figure 2: The structure of edge detection architectures. First architecture is used by HED

[27]. Second architecture is exploited by RCF [30], BDCN [32], PiDiNet[35], and RHN

[36] methods

sented in section 2. Section 3 describes the proposed edge detection method

presented in section 4. The obtained results and discussion of them are

provided in Section 5. A conclusion is presented in section 6.

2. Related works

Edge detection also named boundary or contour detection still one of

the challenging tasks in computer vision using statistical or deep-learning-

based methods. Statistical-based methods exploiting color, local brightness

features [24], clustering algorithms [25], or local image patches [26] for detect-

ing the edge in an image. The statistical-based methods can work on a set of

data while the detection is based on the selected features which makes these

methods accurate for some types of scenarios and less performing for others

[4, 7, 6, 8]. Mainly the low-level features like the pixel intensities and color

gradients [9, 10], the object texture is exploited with learning approaches to

detect the content boundaries in the images [14, 15]. Even so, these meth-

ods have limitations such as low real-time accuracy, high-level information

extraction, and sensitivity to scale and environmental changes.

For deep-learning-based methods that used Convolutional neural net-

works trained on large-scale datasets [29, 30, 31], the possibility to analyze

a large set of scenarios and type of scene becomes possible which makes

such a method to be generic. The proposed method attempted to work on

scale representations to extract the edge at each block of the network. Then

these features are fused to construct the nal edge detection results. For

that, some authors proposed a unied network based on the existing fea-

ture extraction backbone. For example, in [27] VGG-16 network is used for

holistically-Nested edge detection architecture (HED). The same backbone

has been used by Deng et al. [28] for boundaries detection (LPCB). LPCB

method is a unied method (no multi-scale representation) that consists of

connected parallel layers with ResNeXt blocks. Instead of predicting the edge

directly from the input image, the authors in [29] start rst by segmenting

the salient object before extracting the edge from the segmented results using

a deep learning architecture named CAFENet.

For example, in [30] the authors proposed an architecture named richer

convolutional features (RCF) that exploited the VGG-16 backbone of 13 lay-

ers divided into 5 stages, which gives the ability to extract the features from

multilevel and multi-scale representations. In the same context, the authors

in [31] proposed a Bi-Directional Cascade Network (BDCN) architecture that

consists of using each block of the CNN networks with a Scale Enhancement

Module (SEM) for generating multi-scale features, then concatenating all

outputs to obtain the nal edge map.

In the same context, the authors in [32] proposed a Cascaded Network

for edge detection named BDCN. The proposed network consists of a set of

blocks separated by pooling layers and used VGG16 as a backbone. Each

block output is taken into consideration for the nal edge results. The scale

variation is handled using a Scale Enhancement Module (SEM) connected

to each convolutional layer in the network. The SEM Module allows multi-

scale representations for the edge detection learning process. Another method

named REDN is proposed in [34]. Unlike the other methods that used uni-

ed networks with a succession of blocks of convolutional and pooling layers,

REDN is an encoder-decoder method for edge detection. The proposed ar-

chitecture exploits DenseNet architecture as the encoder of the model, while

the transposed convolutions are used for the decoder side.

Like in [31, 32], the authors in [35] proposed a deep learning architec-

ture for edge detection named PiDiNet. The proposed method exploits the

feature extraction network as the backbone of 4 blocks separated by pooling

layers. The dilation convolutions module and a spatial attention module are

applied to the results of each block (stage). The results of all stages are

used to generate the edge detection map. The proposed method has many

versions including baseline, Tiny and small. In the same context, the au-

thors in [36] proposed an edge detection model named RHN based on an

extended version of VGG-16 named residual VGG-16 for feature extraction.

The authors use each backbone stage, separated by pooling layers, outputs

to be the input of a new block of convolutional and upsampling layers. All

the outputs are concatenated to generate the nal edge map. the authors in

[37] proposed a deep learning method for edge detection, while the images

are prepossessed before being introduced into the model. The method starts

by extracting the features from R, G, and B images to compute anisotropic

directional derivative (ANDD) matrices. the feature results are used for post-

possessing with image decomposition by color edge strength maps (CESMs)

technique and color edge direction maps (CEDMs). CESMs features are used

for the classication stage to generate the edge map. From the experiment

and the comparisons with deep learning methods, the proposed method is

less accurate regarding the renement of the edge regions. in the same con-

text, the authors in [38] proposed a Lightweight Dense Convolutional (LDC)

architecture whit a reduced number of parameters.

With the introduction of attention modules and their performance for

Neural Language Processing (NLP) tasks, computer vision researchers at-

tempted to combine self-attention with CNN-based architecture to improve

the performance of image processing tasks [39]. This led to the wide use of

attention instead of convolutions networks for detection or segmentation pur-

poses. In addition, it become the base of creating another technique based on

attention named Transformers which is worked well in NLP [40]. Due to the

successes of transformer networks in Neural Language Processing (NLP), the

researchers attempted to exploit it on images for segmentation and detection

tasks [41]. For that, the authors in [42] proposed a vision-transformer-based

model for edge detection. The proposed method named EDTER consists of

learning from transformer features, then detecting the edge after rening it

to obtain the nal results. Using the two-stage model, EDTER is costly in

terms of complexity and number of parameters compared to the other models

while it costs more than 900.0 GFLOPs in the two stages.

From all the obtained results of the proposed methods, we can observe

that the architectures that are based on multi-scale presentations are the

most accurate, also these methods used the same backbone for features ex-

traction which is VGG-16. For the datasets used for training and evalua-

tion of the proposed methods, the NYUD dataset represents the challenging

dataset compared with the others like BSDS500 and Multicue datasets.

3. Proposed method

3.1. Formulation

Suppose that a sample of training set T denoted by (X, Y ), While X=

{xj;j= 1; ...;|X|} represent the raw of input image, and Y={yj;j=

Conv 3×3

Upsampling

Max Pooling Concatenation

Conv 3×3

𝒀𝟏𝒀𝟐𝒀𝟑

𝑬𝟏

𝑬𝟐

𝑬𝟑

𝑬𝟏𝑬𝟐𝑬𝟑

𝑬𝟏

𝑬𝟐

𝑬𝟑

𝑬𝟏

𝑬𝟐

𝑬𝟑

Figure 3: Flowchart of the proposed edge detection network.

1; ...;|X|};yj∈ {0; 1}is ground-truth of edge map. Also Let E={Ej

i;i=

1; ...;M;j= 1; ...;N}, while M represent the number of blocks in the network,

and N represent the number of output of each block.

The edge of an object can be extracted during the learning process by

removing the meaningless information as well as representing dierent scales.

For that, we select the output of each block also the output of each scale of

the network to get the nal edge map. The edge map Y is a combination of

the output of the S binary image of dierent scales of the network and the

output Eof each block that is used for conserving the resolution of the edge

map. The two components of the proposed methodology can be expressed

as follows.

∑

s=1

(Ys+

∑

i=1

(Es

i)) (1)

where Yscontains generated edge corresponding to a scale s, while the

Figure 4: Feature maps generated by pixel at each stage of the proposed network.While

the re column represent the feature maps of the third convolutions layer before the rst

block. The column 2, 3, 4n and 5 represents the results of in Y2,E2,Y3,E3respectively.

scale represents the part of the network that is composed of a set of convo-

lutional layers without pooling operation, unlike the existing method.

Our purpose is to generate the edge without losing the features that repre-

sent the edges in the image with high resolution. The resolution conservation

is performed by using only convolutional layers with combining them with

the blocks at each time. Each block is the operation of scaling which is a set

of convolutional and pooling layers that make adjacent convolutional layers

depict image patterns at dierent scales.

During the training of an image X, the generated feature map of s-th

convolutional layer Ns(X)represents the generated edge Ys.Ysis used as

input of the Block Bs. After consecutive convolutional and pooling layers

within the Bs, we generated M features Ei,i=1...M that represent the edge

maps of each scale (separated by pooling layer). the output of each scale is

the combination of Ei,i=1...M within the block Bsand the generated Ysedge

map. this operation ensures the resolution conservation that can be aected

by using pooling operation. The loss of information generated edge map

within s-th scale is less than the loss of Yswith the same scale:

Ls(Ys+

∑

i=1

(Es

i)) < Ls(Ys)) (2)

Using the concatenation of dierent scale output, the nal estimated edge

map, if we are suing 3 scale, can be dened as follows:

Y=Y1+

∑

i=1

(E1

i) + Y2+

∑

i=1

(E2

i) + Y3+

∑

i=1

(E3

i)(3)

The number of scale can very from 3 to ns, and the performance is im-

proved proportionally to the number of scale ns. while the nal results can

be close to the ground-truth when the ns is high:

Y=Y1+

∑

i=1

(E1

i) + Y2+

∑

i=1

(E2

i) + Y3+

∑

i=1

(E3

i)(4)

Figure 3 represents the owchart of the proposed architecture. While

gure 4 illustrates all the outputs within each block of the network.

3.2. Proposed Architecture

The existing edge detection methods exploited dierent feature extraction

networks as the backbone for their edge detection architectures. VGG-16 is

the most used backbone for edge detection. These methods follow a common

strategy which is the collaboration of dierent stages of the networks to

elaborate the nal edge results. This collaboration can take various forms

as illustrated in Figure 2. The output of the last layers produces low-quality

images as well as some regions are miss detected and contain noises, due to

the use of a set of pooling layers of the unied network. Which can aect

the quality of the edge map.

In order to maintain the high resolution of edges during the training

process, we inspire by [33] as well as from the existing edge detection models

like [30, 32, 34] using the multiple stages with the same scenario. The same

strategy is followed in our proposed models for edge detection. The loss of

information during the pooling operations makes the resolution of the nal

layer output very low and the contextual information can not be eective

to be used during the fusion process. For that, and in order to conserve

the resolution of the edge image during the network stage, we proposed a

network that makes the pooling output at each stage should be connected

and combined with the previous layer’s output. As presented in Figure 3, the

multi-resolution sub-network is connected with the output of the last layer

which is connected always with the output of the convolution layers without

using the pooling operation. The fusion operation is made at each level of

Table 2: Comparison of edge detection methods On BSDS500 dataset.

Method ODS OIS AP FPS

HED [27] 0.788 0.808 0.840 78

LPCB[28] 0.808 0.824 - 30

RCF [30] 0.806 0.823 - 30

RCF-MS [30] 0.811 0.830 - 8

REDN [34] 0.808 0.828 - -

REDN (+PASCAL)[34] 0.761 0.785 -

PiDiNet-MS [35] 0.807 0.823 - 92

PiDiNet-small [35] 0.798 0.814 148

RHN [36] 0.817 0.833 - 33

CED [19] 0.794 0.811 - -

CED [19] 0.815 0.833 - -

Li et al. [37] 0.731 0.760 0.605 -

BDCN [32] 0.806 0.826 0.847 -

BDCN-MS[32] 0.828 0.844 0.890 -

EDTER [42] 0.824 0.841 0.880 -

Ours-3 0.787 0.788 0.801 86

Ours-MS-3 0.816 0.845 0.846 86

Ours-MS-5 0.830 0.853 0.870 32

the network while the nal output is a concatenation of each stage of the

network.

The number of blocks can vary from 3 to N. In the experiments, we

used one 3 blocks and 5 blocks. we used in the last two blocks a bacth −

normalization layer after each convolution layer to enhance the quality of the

generated edge by removing the non-maximum region between the edge. This

is performed by activating the ane transformation parameter Aff ine =

T rue in the batch−normalization function, which makes the means−weight

and bias used in the learning process.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

[F=.80] Human

[F=.83] BDCN

[F=.83] Our-MS-5

[F=.82] CED-MS

[F=.82] PidiNet-MS

[F=.82] Ours-MS-3

[F=.81] RCF-MS

[F=.80] RCF

[F=.80] CED

[F=.79] AMH-Net

[F=.79] Ours

[F=.77] PidiNet

Figure 5: Precision-Recall curves of the proposed models compared with the best state-

of-the-art methods on BSDS500 dataset.

4. Experimental results

This section provides a demonstration of the experimental results pro-

vided by the proposed method on three edge detection public datasets includ-

ing BDSD500, NYUD, and Multicue. The evaluations and the comparisons

have been performed to prove the eectiveness of the proposed architecture

regarding the quality/quantity of the detection as well as the computational

time for our method against the existing ones. The obtained results are com-

pared with a set of accurate state-of-the-art methods including HED [27],

LPCB[28], REDN [34], RCF [30], PiDiNet [35], RHN [36],BDCN [32], and

Li et al. [37]. The results are also presented by visualizing some examples

for each dataset.

4.1. Implementation details

In order to train and evaluate the proposed method, the original split of

each dataset has been used. Also, the data augmentation used by the exist-

ing method has been respected. For example, the training set of BSDS500

dataset was augmented with ipping (2×), scaling (3×), and rotation (16×),

which lead to a dataset 96× larger than the original version.

We used for training the proposed model a laptop with 16 GB RAM,

NVidia GPU 1070. The code is implemented with python and we used

PyTorch library. The parameters of the model are the same as PidiNet

method [35], while Adam optimizer is used and the learning rate of 0.005.

The weight decay is set at 0.1. We used the same loss function of PidiNet

method [35] with the parameters λand η. While λis set to 1.1 for both

BSDS500 and Multicue, and 1.3 for NYUD. And the threshold ηis set to 0.3

for both BSDS500 and Multicue. No ηis needed for NYUD since the images

are singly annotated.

The proposed method used parameters that can be adaptively related to

the dataset used in training such as the number of blocks, and the number of

epochs used for training the model. The number of blocks, used for training

PidiNet

CED RCF Ours

BDCN

GTImage

Figure 6: Some results of the detected edges on BSDS500 dataset.

the proposed model, is from 3 to 5 like in BDCN [32].

4.2. BSDS500 Dataset evaluation

BSDS500 dataset is composed of 500 images while 200 images are dedi-

cated to the training set, 100 for validation, and 200 for testing. It’s one of

the most used datasets for detecting and extracting the edges in an image.

Like the existing methods including RCF, PidiNet, and BDCN, for training

the proposed method we used the same data augmentation which used the

ipping, scaling, and rotation operations of 2×, 3×, and 16× respectively.

The data also has been mixed with PASCAL-VOC data which is also aug-

mented with the ipping operation and make it 2× larger. In the evaluation

stage, we exploit the Non-maximum suppression (NMS) techniques to thin

and normalize the detected edges. We compared the obtained results with

the most accurate methods in the literature including HED [27],CED [19],

RCF [30], PidiNet [35], BDCN [32], and RHN [36]. The comparison is demon-

strated using ODS, OIS, AP, and FPS metrics represented in Tables 2 with

both single-scale and multiscale (MS) (the nal results is the combination

of dierent scale outputs) versions of the proposed method, Precision-Recall

curves presented in Figure 5, and visualization of some obtained edge maps

illustrated in Figure 6.

From Table 2, Figure 5, and Figure 6 we can observe the performance of

the proposed methods compared with the existing methods. While from the

table, the proposed method achieved 0.830 for ODS metrics, 0.853 for OIS

metric and 0.870 for AP metric outperforming CED, RHN, and BDCN-MS

methods that come in the second and third places by a dierence for ODS

metric of 15%, 13%, and 2% respectively. The same observation for the other

methods as well as the output results presented with Precision-Recall curves

in Figure 5, while the proposed method curve is more stable.

For the edge maps illustrated in Figure 6, the obtained results using

the proposed method are more rened and with a high resolution of the

detected edges compared with RCF, BDCN, and PidiNet methods. While

CED results are improved and close to the obtained results. For example,

the second result shows an accurate detection compared with the ground-

truth image even though the image contains regions that can be classied as

edges. The high resolution of the obtained results is the benet of using the

proposed rened network illustrated before with batch normalization.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Precision

[F=.77] Ours

[F=.75] BDCN

[F=.75] RHN

[F=.74] RCF

[F=.73] PidiNet

[F=.72] HED

Figure 7: Precision-Recall curves of the proposed models compared with the best state-

of-the-art methods on NYUD dataset.

4.3. NYUD Dataset evaluation

NYUDv2 is another edge detection dataset that contains two type sets

of 1449 images, one of RGB images and the same image with depth presen-

tation. The same processes of data augmentation have been performed with

three operations including rotating (4×), ipping (2×), and scaling (3×).

The proposed method has been trained on the RGB set, depth set (HHA),

and the combination of the two sets (RGB+HHA). The same metrics ex-

ploited on BSDS500 have been used for evaluating the proposed method as

Table 3: Comparison of edge detection methods On NYUD dataset

Method RGB HHA RGB-HHA FPS

ODS OIS ODS OIS ODS OIS

HED [27] 0.720 0.734 0.682 0.695 0.746 0.761 62

LPCB [28] (2018) 0.739 0.754 0.707 0.719 0.762 0.778 -

RCF [30] (2019) 0.743 0.757 0.703 0.717 - - 20

BDCN [32] (2019) 0.748 0.763 0.707 0.719 0.765 0.781

PiDiNet [35] (2021) 0.733 0.747 0.715 0.728 0.756 0.773 62

RHN [36] (2021) 0.751 0.762 0.711 0.721 0.772 0.789 24

Ours-3 0.729 0.745 0.718 0.731 0.750 0.774 51

Ours-MS-3 0.756 0.769 0.743 0.755 0.776 0.778 56

Ours-MS-5 0.774 0.795 0.771 0.793 0.791 0.805 34

Table 4: Comparison of edge detection methods On Multicue dataset

Method Boundary Edge FPS

ODS OIS ODS OIS

RCF [30] - - 0.857 0.862 15

BDCN [32] 0.836 0.846 0.891 0.898 9

PiDiNet [35] 0.818 0.830 0.855 0.860 17

RHN [36] 0.841 0.856 0.896 0.905 -

Ours-Ms-5 0.859 0.863 0.907 0.922 23

well as the state-of-the-art methods such as ODS and OIS.

For evaluating the proposed method, we used a maximum tolerance of

0.011 instead of 0.0075 used in BSDS500 dataset, because the size of NYUD

images is larger than BSDS500 dataset ones. We Also performed the training,

testing, and evaluation of the proposed model on the three parts of the

dataset including, the RGB set, HHA set, and RGB+HHA set. Table 3

represents the obtained results using single and multi-scale (MS) models

compared with state-of-the-art methods for the three parts. While Figure 7

illustrates the Precision-Recall curves of these methods on RGB set. From

Predicted edges

Image

Figure 8: Some results of the detected edges from NYUD dataset.

Table 3 we can see that the proposed method outperforms the other methods

for the two metrics including ODS and OIS. While the proposed method with

MS-3 and MS-5 achieved 0.756 and 0.774 for the ODS metric on the RGB set

and outperform BDCN with 2.8%, PidiNet with 4.3%, and RHN with 2.5%.

This observation is true also on the HHA set and RGB+HHA set. Figure 7

proves the obtained results in Table 3, while we can see that the Precision-

Recall curve generated by the proposed method is more stable compared to

the state-of-the-art methods.

In addition to the quantitative results represented in Table 3, Figure 8

illustrates some obtained results using the proposed method on the NYUD

dataset. From the visualized results we can see the quality of the detected

edges as well as the ability of the proposed method to detect edge regions

without ’ghosts’ that represent the unsuspected edges in some regions that

do not contain any edge.

4.4. Multicue Dataset evaluation

Another dataset for edge detection named Multicue contains 100 real-

world images of natural scenes and is represented as a challenging edge de-

tection dataset. Multicue is labeled with edges and boundaries. The dataset

has been augmented with ipping (2×), scaling (3×), and rotation (16×)

operations. The proposed method has been evaluated using ODS and OIS

metrics. Table 4 represents the obtained results using each method for edge

and boundary labels. From the table, we can see that the proposed method

and RHN method reached the best ODS and OIS values for edge and bound-

ary representations. while the proposed method is better than RHN by 1%

for ODS and OIS on boundary label and 1% for ODS and 2% for OIS on

edge label.

5. Conclusion

This paper proposes a rened edge detection Method for detecting the

contour boundaries of image content. The proposed method consists of de-

tecting edges beneting from the multi-scale representation of the network as

well as conserving the high resolution of the output map. Unlike the existing

methods, this is ensured by using the interconnection between consecutive

parts of the network, exploiting the rened batch normalization layer, and

involving the output of the rst layers in the nal results using fusion oper-

ation. After that, we evaluated the proposed method on three competitive

datasets. The obtained results are compared with the best edge detection

methods. The obtained results and the achieved promising performance rates

in the dierent scenarios. Using the proposed method with 3 blocks (Ours-3

and Ours-MS-3), 86 frame per second (FPS) can be processed and with the

obtained performance accuracy the use of the proposed method is suitable

for real-time it useful for semantic edge segmentation as a perspective for the

next work plan. Since with a basic camera, we can record at 60 FPS in Full

HD quality.

References

[1] S. Guiming, S. Jidong, Multi-scale harris corner detection algorithm

based on Canny edge-detection, in: 2018 IEEE International Confer-

ence on Computer and Communication Engineering Technology, 2018,

pp. 305–309.

[2] Hallman, S., and Fowlkes, C. C. (2015). Oriented edge forests for bound-

ary detection. In Proceedings of the IEEE conference on computer vision

and pattern recognition (pp. 1732-1740).

[3] Lopez-Molina, C., Galar, M., Bustince, H., and De Baets, B. (2014). On

the impact of anisotropic diusion on edge detection. Pattern Recogni-

tion, 47(1), 270-281.

[4] S. Zheng, Z. Tu, and A. Yuille. Detecting object boundaries using low-

,mid-, and high-level information. In CVPR, 2007.

[5] P.A. Flores Vidal, G. Villarino, D. Gómez, J. Montero, A new edge de-

tection method based on global evaluation using supervised classication

algorithms, Int. J. Comput. Intell. Syst. 12 (2019) 367–378.

[6] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discrim-

inative sparse image models for class-specic edge detection and image

interpretation. In ECCV, 2008

[7] X. Ren and B. Liefeng. Discriminatively trained sparse code gradients for

contour detection. In NIPS, 2012.

[8] J. Lim, C. L. Zitnick, and P. Dollar. Sketch tokens: A learned mid-level

representation for contour and object detection. In CVPR, 2013.

[9] Romani, L., Rossini, M., & Schenone, D. (2019). Edge detection meth-

ods based on RBF interpolation. Journal of Computational and Applied

Mathematics, 349, 532-547.

[10] Mittal, M., Verma, A., Kaur, I., Kaur, B., Sharma, M., Goyal, L. M.,

... & Kim, T. H. (2019). An ecient edge detection approach to provide

better edge connectivity for image analysis. IEEE access, 7, 33240-33255.

[11] Ji, G. P., Zhu, L., Zhuge, M., & Fu, K. (2022). Fast camouaged ob-

ject detection via edge-based reversible re-calibration network. Pattern

Recognition, 123, 108414.

[12] Xu, X., Chen, J., Zhang, H., Han, G. (2022). SA-DPNet: Structure-

aware dual pyramid network for salient object detection. Pattern Recog-

nition, 127, 108624.

[13] Lin, J., Cai, Y., Hu, X., Wang, H., Yuan, X., Zhang, Y., ... Van Gool,

L. (2022). Coarse-to-ne sparse transformer for hyperspectral image re-

construction. arXiv preprint arXiv:2203.04845.

[14] Wang, B., Chen, L. L., & Zhang, Z. Y. (2019). A novel method on the

edge detection of infrared image. Optik, 180, 610-614.

[15] Eser, S. E. R. T., & Derya, A. V. C. I. (2019). A new edge detection

approach via neutrosophy based on maximum norm entropy. Expert Sys-

tems with Applications, 115, 499-511.

[16] Le, M., & Kayal, S. (2021, July). Revisiting Edge Detection in Con-

volutional Neural Networks. In 2021 International Joint Conference on

Neural Networks (IJCNN) (pp. 1-9). IEEE.

[17] Orhei, C., Bogdan, V., Bonchis, C., & Vasiu, R. (2021). Dilated Filters

for Edge-Detection Algorithms. Applied Sciences, 11(22), 10716.

[18] Elharrouss, O., Akbari, Y., Almaadeed, N., & Al-Maadeed, S. (2022).

Backbones-Review: Feature Extraction Networks for Deep Learn-

ing and Deep Reinforcement Learning Approaches. arXiv preprint

arXiv:2206.08016.

[19] Y. Wang, X. Zhao, and K. Huang. Deep crisp boundaries. In CVPR,

2017.

[20] Jing, J., Liu, S., Wang, G., Zhang, W., & Sun, C. (2022). Recent ad-

vances on image edge detection: A comprehensive review. Neurocomput-

ing.

[21] Bertasius, G., Shi, J., & Torresani, L. (2015). Deepedge: A multi-scale

bifurcated deep network for top-down contour detection. In Proceedings

of the IEEE conference on computer vision and pattern recognition (pp.

4380-4389).

[22] Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., ... & Sun, J.

(2020, August). Learning delicate local representations for multi-person

pose estimation. In European Conference on Computer Vision (pp. 455-

472). Springer, Cham.

[23] Deng, R., & Liu, S. (2020, October). Deep structural contour detection.

In Proceedings of the 28th ACM international conference on multimedia

(pp. 304-312).

[24] D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural

image boundaries using local brightness, color, and texture cues,” IEEE

Trans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 530–549, 2004.

[25] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection

and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 33, no. 5, pp. 898–916, 2011.

[26] P. Dollar and C. L. Zitnick, “Fast edge detection using structured

´forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, pp.

1558–1570, 2015.

[27] Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Pro-

ceedings of the IEEE international conference on computer vision (pp.

1395-1403).

[28] Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, and Xinru

Liu. Learning to predict crisp boundaries. In ECCV, pages 562–578, 2018.

[29] Park, Y. H., Seo, J., & Moon, J. (2020). Cafenet: class-agnostic few-shot

edge detection network. arXiv preprint arXiv:2003.08235.

[30] Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. Richer Convo-

lutional Features for Edge Detection, in IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1939-1946, 1 Aug.

2019, doi: 10.1109/TPAMI.2018.2878849.

[31] Wang, L., Shen, Y., Liu, H., & Guo, Z. (2019). An accurate and ecient

multi-category edge detection method. Cognitive Systems Research, 58,

160-172.

[32] He, Jianzhong, et al. ”Bi-directional cascade network for perceptual edge

detection.” Proceedings of the IEEE/CVF Conference on Computer Vi-

sion and Pattern Recognition. 2019.

[33] Sun, K., Xiao, B., Liu, D., Wang, J. (2019). Deep high-resolution rep-

resentation learning for human pose estimation. In Proceedings of the

IEEE/CVF conference on computer vision and pattern recognition (pp.

5693-5703).

[34] Le, T., & Duan, Y. (2020). REDN: a recursive encoder-decoder network

for edge detection. IEEE Access, 8, 90153-90164.

[35] Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., ... & Liu, L. (2021).

Pixel dierence networks for ecient edge detection. In Proceedings of

the IEEE/CVF International Conference on Computer Vision (pp. 5117-

5127).

[36] Al-Amaren, A., Ahmad, M. O., & Swamy, M. N. S. (2021). RHN: A

Residual Holistic Neural Network for Edge Detection. IEEE Access, 9,

74646-74658.

[37] Li, O., & Shui, P. L. (2021). Color edge detection by learning classi-

cation network with anisotropic directional derivative matrices. Pattern

Recognition, 118, 108004.

[38] Soria, X., Pomboza-Junez, G., Sappa, A. D. (2022). LDC: Lightweight

Dense CNN for Edge Detection. IEEE Access, 10, 68281-68290.

[39] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,

A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in

neural information processing systems, 30.

[40] Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., &

Shlens, J. (2019). Stand-alone self-attention in vision models. Advances

in Neural Information Processing Systems, 32.

[41] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,

Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16

words: Transformers for image recognition at scale. arXiv preprint

arXiv:2010.11929.

[42] Pu, M., Huang, Y., Liu, Y., Guan, Q., & Ling, H. (2022). EDTER: Edge

Detection with Transformer. In Proceedings of the IEEE/CVF Confer-

ence on Computer Vision and Pattern Recognition (pp. 1402-1412).

Learning to utilize gradient information for crisp edge detection

Preprint

Jun 2024

Edge detection is a fundamental task in computer vision and it has made great progress under the development of deep convolutional neural networks (DCNNs), some of them have achieved a beyond human-level performance. However, recent top-performing edge detection methods tend to generate thick and blurred edge lines. In this work, we propose an effective method to solve this problem. Our approach consists of a lightweight pre-trained backbone, multi-scale contextual enhancement module aggregating gradient information (MCGI), boundary correction module (BCM), and boundary refinement module (BRM). In addition to this, we construct a novel hybrid loss function based on the Tversky index for solving the issue of imbalanced pixel distribution. We test our method on three standard benchmarks and the experiment results illustrate that our method improves the visual effect of edge maps and achieves a top performance among several state-of-the-art methods on the BSDS500 dataset (ODS F-score in standard evaluation is 0.829, in crispness evaluation is 0.720), NYUD-V2 dataset (ODS F-score in standard evaluation is 0.768, in crispness evaluation is \textbf{0.546}), and BIPED dataset (ODS F-score in standard evaluation is 0.903).

Research on salient object detection algorithm for complex electrical components

Article

Full-text available

Jun 2024
J INTELL MANUF

Due to the complexity of electrical components, traditional edge detection methods cannot always accurately extract key edge features of them. Therefore, this study constructs a dataset of complex electrical components and proposes a Step-by-Level Multi-Scale Extraction, Fusion, and Refinement Network (SMFRNet) that is based on the salient object detection algorithm. As detailed features includes a wealth of texture and shape characteristics that are related to edges, so the Hierarchical Deep Aggregation U-block (HDAU) is incorporated in the encoder as a means of capturing more details through hierarchical aggregation. Meanwhile, the proposed Multi-Scale Pyramid Convolutional Fusion (MPCF) and Fusion Attention Structure (FAS) achieve step-by-level feature refinement to obtain finer edges. In order to address the issues of imbalanced pixel categories and the difficulty in separating edge pixels, a hybrid loss function is also constructed. The experimental results indicate that this method outperforms nine state-of-the-art algorithms, enabling the extraction of high-precision key edge features. It provides a reliable method for key edge extraction in complex electrical components and provides important technical support for automated components measurement.

Unsupervised deep learning for geometric feature detection and multilevel-multimodal image registration

Article

Full-text available

Jun 2024
APPL INTELL

Medical image registration is a crucial step in computer-assisted medical diagnosis, and has seen significant progress with the adoption of deep learning methods like convolutional neural networks (CNN). Creating a deep learning network for image registration is complex because humans can’t easily prepare or supervise the training data unless it’s very basic. This article presents an innovative approach to unsupervised deep learning-based multilevel image registration approach. We propose to develop a CNN to detect the geometric features, such as edges and thin structures, from images using a loss function derived from the Blake-Zisserman energy. This method enables the detection of discontinuities at different scales without relying on labeled data. Subsequently, we use this geometric information extracted from the input images, to define a second loss function and to perform our multimodal image registration process. Furthermore, we introduce a novel deep neural network architecture for multilevel image registration, offering enhanced precision and efficiency compared to traditional methods. Numerical simulations are employed to demonstrate the accuracy and relevance of our approach. We perform some numerical simulations to show the accuracy and the relevance of our approach for multimodal registration and its multilevel implementation.

MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Article

Full-text available

Jun 2024

Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.

Transformer with difference convolutional network for lightweight universal boundary detection

Article

Full-text available

Apr 2024
PLOS ONE

Although deep-learning methods can achieve human-level performance in boundary detection, their improvements mostly rely on larger models and specific datasets, leading to significant computational power consumption. As a fundamental low-level vision task, a single model with fewer parameters to achieve cross-dataset boundary detection merits further investigation. In this study, a lightweight universal boundary detection method was developed based on convolution and a transformer. The network is called a “transformer with difference convolutional network” (TDCN), which implies the introduction of a difference convolutional network rather than a pure transformer. The TDCN structure consists of three parts: convolution, transformer, and head function. First, a convolution network fused with edge operators is used to extract multiscale difference features. These pixel difference features are then fed to the hierarchical transformer as tokens. Considering the intrinsic characteristics of the boundary detection task, a new boundary-aware self-attention structure was designed in the transformer to provide inductive bias. By incorporating the proposed attention loss function, it introduces the direction of the boundary as strongly supervised information to improve the detection ability of the model. Finally, several head functions with multiscale feature inputs were trained using a bidirectional additive strategy. In the experiments, the proposed method achieved competitive performance on multiple public datasets with fewer model parameters. A single model was obtained to realize universal prediction even for different datasets without retraining, demonstrating the effectiveness of the method. The code is available at https://github.com/neulmc/TDCN .

An Expert System Based on Type-1 Fuzzy Logic and Digital Image Processing for Knowledge Based Edge and Contour Detection

Article

Full-text available

Jan 2023

Enhancing Deep Edge Detection through Normalized Hadamard-Product Fusion

Article

Full-text available

Feb 2024

Deep edge detection is challenging, especially with the existing methods, like HED (holistic edge detection). These methods combine multiple feature side outputs (SOs) to create the final edge map, but they neglect diverse edge importance within one output. This creates a problem: to include desired edges, unwanted noise must also be accepted. As a result, the output often has increased noise or thick edges, ignoring important boundaries. To address this, we propose a new approach called the normalized Hadamard-product (NHP) operation-based deep network for edge detection. By multiplying the side outputs from the backbone network, the Hadamard-product operation encourages agreement among features across different scales while suppressing disagreed weak signals. This method produces additional Mutually Agreed Salient Edge (MASE) maps to enrich the hierarchical level of side outputs without adding complexity. Our experiments demonstrate that the NHP operation significantly improves performance, e.g., an ODS score reaching 0.818 on BSDS500, outperforming human performance (0.803), achieving state-of-the-art results in deep edge detection.

Boosting edge detection via Fusing Spatial and Frequency Domains

Article

Jun 2024
PATTERN RECOGN

Detection method of Si3N4 bearing rollers point microcrack defects based on adaptive region growing segmentation

Article

May 2024

Registration of multimodal bone images based on edge similarity metaheuristic

Article

Apr 2024

LDC: Lightweight Dense CNN for Edge Detection

Article

Full-text available

Jun 2022

This paper presents a Lightweight Dense Convolutional (LDC) neural network for edge detection. The proposed model is an adaptation of two state-of-the-art approaches, but it requires less than 4% of parameters in comparison with these approaches. The proposed architecture generates thin edge maps and reaches the highest score (i.e., ODS) when compared with lightweight models (models with less than 1 million parameters), and reaches a similar performance when compare with heavy architectures (models with about 35 million parameters). Both quantitative and qualitative results and comparisons with state-of-the-art models, using different edge detection datasets, are provided. The proposed LDC does not use pre-trained weights and requires straightforward hyper-parameter settings. The source code is released at https://github.com/xavysp/LDC .

Recent advances on image edge detection: A comprehensive review

Article

Full-text available

Sep 2022
NEUROCOMPUTING

Edge detection is one of the most important and fundamental problems in the field of computer vision and image processing. Edge contours extracted from images are widely used as critical cues for various image understanding tasks such as image segmentation, object detection, image retrieval, and corner detection. The purpose of this paper is to review the latest developments on image edge detection. Firstly, the definition and properties of edges are introduced. Secondly, the existing edge detection methods are classified and introduced in detail. Thirdly, the existing widely used datasets and evaluation criteria for edge detection methods are summarized. Finally, future research directions for edge detection are elaborated.

Pixel Difference Networks for Efficient Edge Detection

Conference Paper

Full-text available

Oct 2021

Dilated Filters for Edge-Detection Algorithms

Article

Full-text available

Nov 2021

Edges are a basic and fundamental feature in image processing that is used directly or indirectly in huge number of applications. Inspired by the expansion of image resolution and processing power, dilated-convolution techniques appeared. Dilated convolutions have impressive results in machine learning, so naturally we discuss the idea of dilating the standard filters from several edge-detection algorithms. In this work, we investigated the research hypothesis that use dilated filters, rather than the extended or classical ones, and obtained better edge map results. To demonstrate this hypothesis, we compared the results of the edge-detection algorithms using the proposed dilation filters with original filters or custom variants. Experimental results confirm our statement that the dilation of filters have a positive impact for edge-detection algorithms from simple to rather complex algorithms.

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

Chapter

Oct 2022

Many learning-based algorithms have been developed to solve the inverse problem of coded aperture snapshot spectral imaging (CASSI). However, CNN-based methods show limitations in capturing long-range dependencies. Previous Transformer-based methods densely sample tokens, some of which are uninformative, and calculate multi-head self-attention (MSA) between some tokens that are unrelated in content. In this paper, we propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction. In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing. Comprehensive experiments show that our CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs. https://github.com/caiyuanhao1998/MSTKeywordsCompressive imagingTransformerImage restoration

EDTER: Edge Detection with Transformer

Conference Paper

Jun 2022

SA-DPNet: Structure-aware Dual Pyramid Network for Salient Object Detection

Article

Mar 2022
PATTERN RECOGN

Salient object detection aims at highlighting the most visually distinctive objects in the scene. Previous deep learning based works mainly focus on designing different integration strategies of multi-level features to improve the quality of prediction. However, due to the negligence of spatial structure coherence in predicted saliency maps, they fail to produce satisfactory results in complex scenarios. In this work, we present a structure-aware dual pyramid network (SA-DPNet) for salient object detection. By explicitly formulating spatial location information and spatial covariance features into the self-attention mechanism, a structure-aware spatial non-local block is proposed in SA-DPNet to learn the spatial-sensitive global context. With the proposed edge loss and adversarial loss, the edge structure context and patch-based global structure context are introduced to refine the structural coherence of the predicted results. Comprehensive experimental results on six RGB saliency benchmark datasets and three RGB-D saliency benchmark datasets demonstrate the superiority of proposed SA-DPNet over other state-of-the-art methods, both quantitatively and visually.

Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

Article

Nov 2021
PATTERN RECOGN

Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) to their surroundings, and recently has attracted growing research interest. As camouflaged objects often present very ambiguous boundaries, how to determine object locations as well as their weak boundaries is challenging and also the key to this task. Inspired by the biological visual perception process when a human observer discovers camouflaged objects, this paper proposes a novel edge-based reversible re-calibration network called ERRNet. Our model is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-calibration Unit (RRU), which aim to model the visual perception behaviour and achieve effective edge prior and cross-comparison between potential camouflaged regions and background. More importantly, RRU incorporates diverse priors with more comprehensive information comparing to existing COD models. Experimental results show that ERRNet outperforms existing cutting-edge baselines on three COD datasets and five medical image segmentation datasets. Especially, compared with the existing top-1 model SINet, ERRNet significantly improves the performance by ∼6% (mean E-measure) with notably high speed (79.3 FPS), showing that ERRNet could be a general and robust solution for the COD task.

Revisiting Edge Detection in Convolutional Neural Networks

Conference Paper

Jul 2021

Color Edge Detection by Learning Classification Network with Anisotropic Directional Derivative Matrices

Article

May 2021
PATTERN RECOGN

In this paper, a neural network-based color edge detector is constructed by learning a classifier using anisotropic directional derivative (ANDD) matrices of a color image as input. The training stage on a color edge dataset with ground truth (GT) edges includes calculation of ANDD matrices, generation of feature matrices, and training a classifier. For each training image, a set of ANDD matrices are calculated from the ANDDs with different parameter setups for training and from which a set of the color edge strength maps (CESMs) are extracted by the singular vector decomposition. The CESMs and the GTs on edges of the image are combined into a feature matrix for training. Using the feature matrices of all the training images as input, a classification neural network is trained and it outputs the probability of a pixel to be an edge pixel. In the detection stage, for a color image, its ANDD matrices, CESMs, and the color edge direction maps (CEDMs) are first computed and then the CESMs are input into the classification neural network to obtain the edge probability map (EPM) of the image. Finally, the non-maximum suppression and hysteresis thresholding are applied to the EPM and CEDMs to generate the binary edge map. The proposed detector attains better performance than the existing gradient-based detectors and is competitive with learning-based detectors on three commonly-used color image datasets for edge and contour detection.

Refined Edge Detection With Cascaded and High-Resolution Convolutional Network

Figures

Recommended publications

Cascaded Boundary Network for High-Quality Temporal Action Proposal Generation

Edge detection with multi-scale representation and refined Network

Traditional Method Inspired Deep Neural Network For Edge Detection

Research on Crack Edge Detection of Aircraft Skin Based on Traditional Inspired Network

Traditional Method Inspired Deep Neural Network for Edge Detection