ArticlePDF Available

Prior Mask R-CNN Based on Graph Cuts Loss and Size Input for Precipitation Measurement

May 2021
IEEE Transactions on Instrumentation and Measurement PP(99):1-1

May 2021
PP(99):1-1

DOI:10.1109/TIM.2021.3077996

Authors:

Mingchun Li

Northeastern University (Shenyang, China)

Shixin Liu

Northeastern University (Shenyang, China)

Fang Liu

Northeastern University (Shenyang, China)

Fusing prior knowledge with the data-driven deep learning for measurement is interesting and challenging. For the detection of metallographic precipitations, the measurements of size and shape of precipitations are roughly predictable in advance through transmission electron microscope (TEM). In this paper, we proposed a novel instance segmentation network named Prior Mask R-CNN by fusing prior knowledge for automatic precipitation detection. On the basis of typical Mask R-CNN framework, we made the following improvements. First, at the bounding box classification stage, in order to restore area information, we input the size information besides only uniform dimension features after region of interest align (RoIAlign). Secondly, at mask segmentation stage, we proposed a new loss function based on normalized graph cuts. It is category sensitive by setting different weight strategies for different categories based on their prior shapes. In addition, from the point view of practicality, we designed an effective measurement extraction module to get specific measurements, such as the length of precipitations, from the final prediction results of our network. In a variety of experiments, our method achieves the highest mean average precision (mAP) 0.475 and 0.298 among different famous methods for bounding box detection and mask segmentation tasks, respectively, which proves the effectiveness of our method.

Content uploaded by Fang Liu

Content may be subject to copyright.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021 5010015

PriorMaskR-CNNBasedonGraphCutsLossand

Size Input for Precipitation Measurement

Mingchun Li ,DaliChen , Shixin Liu ,Member, IEEE, and Fang Liu

Abstract— Fusing prior knowledge with data-driven deep

learning for measurement is interesting and challenging. For

the detection of metallographic precipitations, the measurements

of size and shape of precipitations are roughly predictable in

advance through a transmission electron microscope (TEM).

In this article, we proposed a novel instance segmentation

network named prior mask R-CNN by fusing prior knowledge

for automatic precipitation detection. On the basis of the typical

mask R-CNN framework, we made the following improvements.

First, at the bounding box classiﬁcation stage, in order to restore

area information, we input the size information besides only

uniform dimension features after the region of interest align

(RoIAlign). Second, at the mask segmentation stage, we pro-

posed a new loss function based on normalized graph cuts.

It is category-sensitive by setting different weight strategies for

different categories based on their prior shapes. In addition, from

the point of view of practicality, we designed an effective mea-

surement extraction module to get speciﬁc measurements, such

as the length of precipitations, from the ﬁnal prediction results of

our network. In a variety of experiments, our method achieves the

highest mean average precision (mAP) of 0.475 and 0.298 among

different famous methods for bounding box detection and mask

segmentation tasks, respectively, which proves the effectiveness

of our method.

Index Terms—Graph cuts, instance segmentation, metallo-

graphic image, precipitation detection, prior knowledge.

I. INTRODUCTION

THE precipitations are nanoscale microstructures of alloy

materials, which play key roles in the mechanical prop-

erties of products, such as yield strength, ultimate tensile

strength, and elongation. As a result, it is extremely impor-

tant to measure the precipitates accurately. In this article,

we mainly focus on six-series aluminum alloy. It is an excel-

lent structural material due to its good formability, corrosion

resistance, weld ability, and low cost [1]. In order to investigate

the nanoscale microstructures in materials, a transmission

electron microscope (TEM) is typically used [2]. When we

Manuscript received December 25, 2020; revised April 12, 2021; accepted

April 29, 2021. Date of publication May 6, 2021; date of current version

May 19, 2021. This work was supported in part by the National Key Research

and Development Program of China under Grant 2017YFB0306400 and

in part by the National Natural Science Foundation of China under Grant

61773104. The Associate Editor coordinating the review process was

Mohamad Forouzanfar. (Corresponding author: Dali Chen.)

Mingchun Li, Dali Chen, and Shixin Liu are with the College of

Information Science and Engineering, Northeastern University, Shenyang

110819, China (e-mail: 407996328@qq.com; chendali@ise.neu.edu.cn;

sxliu@mail.neu.edu.cn).

Fang Liu is with the School of Materials Science and Engi-

neering, Northeastern University, Shenyang 110819, China (e-mail:

liufang@smm.neu.edu.cn).

Digital Object Identiﬁer 10.1109/TIM.2021.3077996

observe the aluminum alloy with TEM under the standard

setting, we ﬁnd that the precipitates are embedded in the alloy

matrix (aluminum) in horizontal, vertical, and longitudinal

directions with expected size, as shown in Fig. 1.

From the point of view of material science, these

microstructures are described as needle-shaped precipitates

(horizontal and longitudinal) and dot-shaped precipitates

(vertical), respectively, and have important worthiness for

researching and studying [3]. Considered the observed precip-

itates in TEM, images are often ambiguous, and the contours

are not as obvious as natural images. The traditional computer

vision method is difﬁcult to directly measure the precipitates.

For practical production and academic research, material sci-

entists need to manually measure the precipitates in these three

directions for each specimen, which is time-consuming and

boring.

Fortunately, in recent years, with the development of com-

puter vision, methods based on deep learning achieved awe-

some results in image classiﬁcation [4], boundary detec-

tion [5], object detection [6], and image segmentation [7].

In fact, in the measurement ﬁeld for materials science, deep

learning methods are also widely used. For example, for

steel inner microstructures, Azimi et al. [8] designed a fully

convolutional neural network (FCNN) under a novel max

voting strategy to obtain pixel-level segmentation of marten-

site, tempered martensite, bainite, and pearlite. For nickel-

based superalloy, Wang et al. [9] used typical U-Net to get

information of precipitates and established a microstructure-

related hardness model according to the segmentation results.

For aluminum alloy, Li et al. [10] employed generative adver-

sarial network (GAN) and multitask learning to achieve the

detection of the second phase particles and grain bound-

aries. These deep learning methods tend to perform better

than the traditional machine learning methods (support vec-

tor regression [11], shallow neural network [12], and mean

shift [13]) or rule-based methods (automatic thresholding [14],

level set [15], graph cuts [16], and ultimate opening [17]),

especially when dealing with more complex measurement

tasks. Therefore, in view of the complexity of TEM images in

this article, we designed an instance segmentation framework

based on deep learning to exactly detect precipitations in the

alloy.

As a natural extension of the object detection task, instance

segmentation aims to predict pixelwise object instance seg-

mentation and object category [18]. In recent years, it has

been widely used in the ﬁeld of measurement, for example,

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

Fig. 1. Stereogram display of the three directions (horizontal, vertical, and longitudinal) of precipitations embedded in aluminum alloy.

detection and diagnosis of electrical equipment based on

infrared [19] and additional natural images [20]. In fact,

the major instance segmentation frameworks are based on

proposal segmentation, considering the great success of R-

CNN [21]. A typical example is mask R-CNN [22]. It consists

of two parts: region proposal network (RPN) and region of

interest network. Based on box regression and classiﬁcation

involved in the second stage of faster R-CNN [23], it added

an additional mask prediction branch. At that time, it achieved

top performance for the MS COCO instance segmentation

task [24]. The success of this method is attributed to the excel-

lent performance of a fully convolution network (FCN) [25]

in segmentation tasks and the effectiveness of gradient prop-

agation under the region of interest align (RoIAlign) layers.

Following this route, in recent years, more instance segmenta-

tion methods have been proposed. For the strategy of instance

segmentation framework, Liu et al. [26] proposed PAFPN

that enhanced the entire feature hierarchy at the stage of

RPN and linked feature grid by adaptive feature pooling. For

the topological structure of instance segmentation framework,

cascade mask R-CNN [27], hybrid task cascade [28], and

mask scoring R-CNN [29] set up an extra block to improve

performance through cascade structures [27], [28] and quality

score modules [29]. It is worth noting that these methods

do not pay attention to the combination of prior knowledge.

This is understandable, of course, because these methods are

proposed for natural images, in which the information (size

or shape) of different categories is diffuse and unpredictable.

However, when there is obvious and predictable knowledge of

categories, such as the shape of metal microstructures, how to

effectively use prior knowledge is worth further consideration.

From the perspective of the training strategy, the most

intuitive way to employ prior knowledge is based on transfer

learning, which is widely used in a variety of deep learning

frameworks [30]. Transfer learning can expand the target

domain data by using the relevant source domain data, so as

to achieve the network with generalization ability and improve

the network performance. Typical paradigms include ﬁne-

tuning parameters [31], domain adaptation [32], and so on.

It is one of the important tools to solve few-shot learning

from the perspective of data [33]. It implicitly transfers prior

knowledge to the network through relevant data. Besides that,

some recent studies showed that prior knowledge could be

fused to the end-to-end network more explicitly. For example,

when the shape of the foreground is known in the segmentation

task, Mirikharaji and Hamarneh [34] proposed a novel loss

term to encode the object shape and embed it into the loss

function to punish the predicted shape that does not satisfy

the prior knowledge. Han et al. [35] designed convex shape

sensitive loss function through a simple ergodic formula to

improve the robustness of the deep network to the noise and

reﬂection for pupil segmentation. When the location of each

category is known in advance, Zotti et al. [36] took the cardiac

distribution in 3-D position as prior knowledge and merge

it with the feature map before the classiﬁcation layer in the

network to improve the performance of cardiac segmentation.

In addition to the training strategy, we can also modify the

structure of the network to fuse prior knowledge. Speciﬁcally,

for the image segmentation task, an obvious knowledge is that

the prediction results should be smooth and continuous. This

is very important for a dense prediction network, considering

that the network is usually trained pixelwise. In order to

address this problem, Chen et al. [37] designed an additional

conditional random ﬁeld (CRF) module as postprocessing for

the prediction of a deep network to improve the localization

performance. Similarly, Zheng et al. [38] integrate the complex

energy function inference process of CRF into the running

logic of recurrent neural networks (RNNs), so as to realize the

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

end-to-end training with additional position and color informa-

tion. In fact, for the typical black-box model, such structural

modiﬁcation can effectively make up for the deﬁciency of deep

learning and transfer some common and general knowledge to

the network to ensure the rationality of the prediction results.

In this article, our contributions can be summarized as

follows.

1) We proposed a two-stage instance segmentation frame-

work called prior mask R-CNN for automatic metallographic

precipitation measurement of aluminum alloys. At the RoI

network stage, we input the size information of each object for

recovering the area information between different categories

after the RoIAlign layer.

2) A new loss function based on normalized graph cuts is

proposed. By assigning weights in the graph based on different

rules for each category, we designed a shape-sensitive cut loss

function and embed it into the mask training period with the

original cross-entropy loss function, meanwhile.

3) We developed a simple postprocessing module to extract

the measurement information from the prediction results based

on the region-growing algorithm. This module could effec-

tively obtain quantitative information about the precipitates,

which plays a key role from the perspective of practical

applications.

II. METHODOLOGY

Here, we clearly point out that our task is to measure

the precipitates in TEM images of aluminum alloys. The

categories of precipitates can be divided into three types

according to their growing direction: horizontal precipitates,

vertical precipitates, and longitudinal precipitates (see Fig. 1).

First, we get the instance segmentation from the metallo-

graphic image by the proposed prior mask R-CNN. Second,

for the prediction results, we set up a postprocessing module to

obtain the speciﬁc measurement of each kind of precipitate.

In this section, we will speciﬁcally introduce the proposed

methodology in detail. Section II-A introduces the topology

structure of prior mask R-CNN and the speciﬁc size input

link. Section II-B demonstrates the cut loss function that

includes shape prior knowledge based on graph cuts. The

postprocessing module used to extract precipitate information

would be presented in Section II-C.

A. Structure of Prior Mask R-CNN

In this work, our ultimate goal is to help materials science

get concerned information from metallographic images. This

information mainly refers to the statistical information of

the size of precipitates, which can reveal the mechanical

properties of the alloy. Speciﬁcally, different from image

classiﬁcation [39], [40] and semantic segmentation [41], our

task needs to detect each precipitate in the image and measure

it one by one. That is to say, it involves object detection [42]

and segmentation in turn. From the perspective of computer

vision, this is a typical instance segmentation task [43], [44].

Considering the challenge of speciﬁc noises (such as occlu-

sion, interference, and distortion) in the TEM images, on the

basis of the mask R-CNN framework, we introduce the size

information, and its speciﬁc structure is shown in Fig. 2.

In general, our network mainly consists of three parts:

backbone, RPN, and RoI network. The backbone is deployed

to obtain the abstract features of metallographic images by a

hierarchic convolution operation. On the basis of these con-

volution feature maps, RPN is employed to provide proposals

that may contain foreground at the ﬁrst stage, and the RoI

network is used to ﬁne-tune the proposal results and get the

mask segmentation at the second stage. Speciﬁcally, for back-

bone, we adopt the ResNet50 and feature pyramid networks

(FPNs). After constant stride operations in ResNet50, a series

of feature maps with different sizes are obtained. In FPN, these

features maps will be gradually fused, and ﬁnally, we get ﬁve

scale feature maps (x1−5).AtthestageofRPN,wesetup

three kinds of anchors with different length-to-width ratios.

Here, we do foreground detection for each feature map of ﬁve

scales, respectively, instead of multiscale anchor for fusion

features. That is to say, the loss function of RPN is the sum

of box classiﬁcation loss and regression loss from ﬁve feature

maps, as follows:

LRPN =Lrpn__cls +Lrpn__reg =



i=1

Lrpn__cls (xi)+Lrpn__reg (xi)

(1)

where xiindicates the feature map from the ith scale.

The last part of the framework is the RoI network. It is

used to classify objects and segment each instance. Its input

involves two parts: the proposal from RPN and the feature

maps extracted based on the backbone. Among them, the effect

of the backbone is to extract deep features through learned

hierarchical convolution. The effect of RPN is to provide

candidate boxes that might contain the desired object from

the image. In order to obtain more accurate boundary boxes

of objects, nonmaximum suppression (NMS) based on IoU is

employed [22]. It could reject a region if it has an IoU overlap

with a higher scoring selected region larger than a certain

threshold. After NMS, we could get multiple instance-level

feature maps by cropping or resizing the image-level features

under the instructions of boxes. These feature maps will be fed

into the regression layer used to ﬁne-tune the boundary box

and the classiﬁcation layer used to identify the object category.

After the boundary boxes correction at the RoI network stage,

we obtain the reﬁned feature maps by RoIAlign and fed them

into the mask layer used to object segmentation. Different from

the fully connected network at the classiﬁcation and regression

layer, the segmentation layer is based on the convolution

network. The total loss of the RoI network could be expressed

as follows:

LRoI =Lcls +Lreg +Lmask.(2)

It is worth noting that the RoIAlign operation will change

the feature map of different scales into a uniﬁed scale through

bilinear interpolation. This operation can effectively transform

the objects of different scales to a uniform size, which is

necessary for the following network of classiﬁcation and

segmentation tasks. In essence, RoIAlign [22] is standard

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

Fig. 2. Structure of prior mask R-CNN.

operations for extracting a very small feature map from each

proposal box, regardless of the box size. That is to say, after the

RoIAlign operation, whether it is a large box or a small box,

it will convert it into a normalized feature map indiscriminately

by interpolation. There is no doubt that such an operation will

greatly damage the scale information of the object, especially

when the scale has a clear correlation with the object category.

To solve this scale damage problem, we assume that the sizes

of precipitates are related to the manufacturing process and

could be predicted when accurate precipitate information is

obtained. For example, the average size of the horizontal

precipitate is 1046.7 nm2in Fig. 3. That is, the object category

is highly dependent on the box size. However, all the feature

maps after RoIAlign would be the same size, and their original

scale information would be discarded. Therefore, we make the

following structural adjustment that constitutes our method’s

novelty and effectiveness.

For the classiﬁcation (cls) layer and regression (reg) layer

of RoI, we adopt a four-layer fully connected network. First,

we straighten the 3-D feature graph (channel ×height ×

width) to the 1-D feature by the ﬂatten operation. Next,

straightened features are fed into two fully connected layers

whose outputs are gradually reduced in turn. In the output

layer, we fuse the size information as an additional input

(green neural unit) with the output of the second fully con-

nected layer. Here, we point out that the size is calculated

based on the area of the boundary box, which is readily

available through a simple product operation from the RPN

results. Finally, the classiﬁcation layer and the regression

layer based on the shared features, respectively, predict the

category of the object and the boundary box location. Specif-

ically, in Fig. 3, classes H, V, and L refer to the horizontal

precipitates, vertical precipitates, and longitudinal precipitates

respectively; tcx,tcy ,tcw,and tch are the offsets of the detected

box for category c. So far, through a simple skip connection,

we realized the size input to help the model fuse scale

information without additional high computational cost.

B. Loss Function of Prior Mask R-CNN

Once the topology of the network is determined, we need

to consider speciﬁc learning objectives. The objective function

of the optimization problem is more commonly called the

loss function for machine learning. It plays a key role in

deep learning, which determines how to guide the parameters

in the model to update. Whether for supervised learning or

unsupervised learning, it is very important to set a reasonable

loss function. In this section, we will introduce the loss

function involved in prior mask R-CNN in detail.

In general, the loss function in prior mask R-CNN is mainly

involved at the RPN stage and the RoI network, which are

shown as (1) and (2), respectively. In (2), we ﬁnd that the

loss function of the RoI network consists of three parts:

classiﬁcation loss Lcls , regression loss Lreg, and segmentation

loss Lmask . In view of the structure of total loss function,

it could be regarded as multitask learning [45], which aims

to leverage valuable knowledge that is involved in related

tasks to improve the whole performance of the network. First,

we show the classiﬁcation loss and the regression loss of the

RoI network for one sample, as follows:

Lcls =kc



c=1

−gclog(sc)(3)

Lreg =kc



c=1



i=1

gc|t∗

ci −tci |(4)

where kcis the weight of category c(horizontal, vertical, and

longitudinal precipitations), gc∈{0,1}is the class-level binary

ground truth, sc∈[0,1]is the class-level prediction result

after activation function, and t∗=(t∗

cx,t∗

cy,t∗

cw,t∗

ch )is the real

offset set between ground truth and region of interest, whereas

t=(tcx,tcy ,tcw,tch )is the predicted offset set of ground truth

and interesting region.

In (3) and (4), we add the weight kcfor categories based

on standard cross-entropy loss and L1-norm loss function.

The reason why we do this is that the number of different

kinds of precipitates is obviously different. In fact, unlike the

unpredictability of objects in natural landscape images, each

metallographic image in our dataset contains all three kinds

of precipitates with different amounts at the same time. For

example, in a TEM image, the number of vertical precipitates

observed is always much more than that of longitudinal

or horizontal precipitates. Therefore, inspired by the class-

balanced strategy [46], we use the weight kcto alleviate the

imbalance problem of samples for the object classiﬁcation and

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

Fig. 3. Proposed box size input (green neural unit) by skipping connection in the RoI network.

boundary box regression tasks. Speciﬁcally, kcis equal to the

total number of objects divided by the number of objects for

category c.

Besides, for the mask layer in the RoI network, its task is

to segment the region of interest. Different from the previous

classiﬁcation layer and regression layer, it is a dense prediction

based on a convolution network. Here, we directly show the

mask loss function for metallographic precipitates in prior

mask R-CNN, as follows:

Lmask =αLCE +βLB+γLCut (5)

where LCE means the typical cross-entropy loss, LBis the

boundary loss proposed in [47], LCut indicates the proposed

cut loss in this article, and α=1, β=1.5, and γ=1.5are

the weights for these three losses.

Among them, the cross-entropy loss LCE is the most

common loss function in segmentation tasks. It guides the

network learning by calculating the cross entropy of each pixel

independently based on ground truth. It can be expressed as

follows:

LCE =



c=1

−Gc·log(Sc)(6)

where Gc∈{0,1}Nis the binary vector based on ground truth

with the shape of Nfor category c,Nis the total number of

pixels in image, Sc∈[0,1]Nis the predicted value vector

after sigmoid function for speciﬁc category c,and·means the

vector inner product.

In fact, from the nature of the loss function, the cross-

entropy loss as in (6) is to treat the image segmentation as

many isolated pixel classiﬁcation problems, which is some-

what inconsistent with the human visual system. In order to

alleviate this problem, many novel design ideas are proposed

to compensate for the cross-entropy loss. For example, we can

directly optimize the evaluation index to improve the per-

formance of the model. Speciﬁcally, in [48], the Dice loss

function based on the similarity measurement dice coefﬁcient

is proposed to solve the imbalance between foreground and

background voxels in medical images. In fact, for the class

imbalance problem of image segmentation, the Dice loss

function is widely employed for various deep networks and

respective tasks. Besides V-Net [48], in [49], it is used to

train a fully convolutional densenet for diffusion-weighted

images. Similarly, for the vessel segmenting in the X-ray

coronary angiography image sequence [50], the Dice loss

is selected as a loss function to train an encoder–decoder

framework with a channel attention mechanism to tackle the

class imbalance problem. Similarly, the Hausdorff distance

used to quantify the difference between two sets is also

encoded as a simple loss function, which is estimated by

three approximate methods in [51]. In addition, the perceptual

loss from the deep network, the loss function based on the

region, and the energy-based loss function are also widely

used to solve their respective problems [52]. No doubt, it is

important to design an appropriate loss function according

to the speciﬁc situation. Considering that the morphology

of precipitates in TEM metallographic images is predictable

(vertical precipitates as dot-shaped, horizontal precipitates,

and longitudinal precipitates as needle-shaped with different

directions), which is effective prior knowledge, we propose

cut loss based on normalized graph cuts to compensate for

the cross-entropy loss.

Graph cuts are an effective unsupervised image segmen-

tation method based on graph clustering. For the binary

classiﬁcation of foreground and background in image I,let

the point set of foregrounds be Aand that of backgrounds be

B,thatis, AB=∅,AB=I. The graph cuts can be

expressed as follows,

Cut(A,B)=

u∈A,v∈B

w(u,v),(7)

where w(u,v) indicates the designed weight between uand v.

Speciﬁcally, to improve the performance of segmentation,

a regularized and extended version of the cut measurement

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

named normalized graph cuts [53] can be written as

NCut(A,B)=Cut(A,B)

assoc(A,I)+Cut(A,B)

assoc(B,I)(8)

where assoc(A,I)=u∈A,t∈Iw(u,t)indicates the sum of

weights between points in Aand all the points in image I.

assoc(B,I)is the same deﬁnition.

From the perspective of optimization, once the weight

matrix Wis determined, the objective function based on

normalized graph cuts is

min



c=1

cWc(1−Vc)

cVc

(9)

where Wc∈RN∗Nis the weight matrix for category C,gc∈

{0,1}is the class-level binary ground truth representing the

category contained in image I,Dc=Wc1 means the sum

of the elements of each row in Wc,andVC∈{0,1}Nis the

decision variable.

Here, we expand the binary classiﬁcation task between

foreground and background into a multiclassiﬁcation problem,

which is more appropriate for our task. It is obvious that, for

the optimization problem in (9), the decision variable VCis

the ﬁnal segmentation result based on graph cuts. In order to

solve this kind of problem, an effective method is to transform

it into an eigenvalue solving problem based on the Rayleigh

quotation [53]. Furthermore, if we relax the hard constraint of

VCfrom {0,1}Nto the soft [0,1]N, regarding it as probability

output Scof deep learning after sigmoid activation function

is interesting. In other words, we transform the intuitive

optimization of decision variables into the optimization of

network parameters in deep learning. Inspired by [54], such

an optimization problem could be used as loss functions in

a deep network and solved iteratively by a backpropagation

algorithm based on gradient. The proposed cut loss function

and its gradient can be written as follows:

LCut =



c=1

cWc(1−Sc)

cSc

(10)

∂LCut

∂θ =

∂C

c=1gc

cWc(1−Sc)

cSc

∂θ =



c=1

∂ST

cWc(−Sc)

cSc

∂θ



c=1

gcST

cWcScDc

DT

cSc2−2WcSc

cSc∂Sc

∂θ (11)

where θindicates the parameter of the deep network.

In (10) and (11), the category weight Wcspeciﬁes the

correlation strength between pixels, which largely determines

the ﬁnal segmentation result. Generally speaking, the weight

matrix is symmetric, and each element needs to be calculated

independently. A simple way to obtain the weight matrix is to

use the kernel function based on pixel feature vector, which

is the popular method to get the energy function in CRF [55]

Wci,j=kFi,Fj=e−(Fi−Fj)T−1

c(Fi−Fj)(12)

where the pair iand jrefer to the index position in the matrix

Wc,Fiis the feature vector for pixel iand the same deﬁnition

for Fj,and−1

cindicates the inverse of the covariance matrix.

Considering the smoothness of prediction and the difference

of categories, the features of pixels include the 2-D position

information {X, Y} and the color information with three

channels {R, G, B}. Here, for the sake of simplicity, we only

consider the gray image with single channel G. In other words,

the speciﬁc representation of feature vector Fifor pixel i

is [xi,yi,gi]. Next, we discuss the relationship between the

weight matrix and prior knowledge.

In fact, when the shape of a particular category is known

in advance, the statistics characteristics of positions are also

predictable. This gives us inspiration when setting the weight

matrix in the cut loss function. As shown above, our task

is to segment the precipitates in three different directions:

horizontal, vertical, and longitudinal. The precipitates with

different shapes have their own statistic characteristics of the

position. For example, for horizontal precipitates, the position

axes Xand Yare negatively correlated. In contrast, Xand

Yin longitudinal precipitates are always positively correlated.

In fact, this phenomenon still exists after the RoIAlign of the

prior mask R-CNN framework because it does not change

the sign of the correlation between position axes Xand Y.

An example is shown in Fig. 4.

In Fig. 4, we show the weight matrix of the central

pixel, which could be obtained by reshaping the medium

row or column in the category weight matrix Wc.First,

it should be pointed out that, after the RoIAlign for the feature

map, we need to additional extract the raw image for the

same boundary box proposal to obtain the color information,

as shown in the upper left of Fig. 4. Here, we set three kinds

of correlation degree τxy ={+1.5,0,−1.5}with the color

channel to illustrate the difference of weight matrix under

respective strategies. Among them, the color channel focuses

on gray difference whereas the position focuses on smooth-

ness. The weight matrix of Color Channel+XYChannel(τxy =

+1.5)(lower middle) is the closest to the binarization matrix

based on the ground truth, which is also the most consistent

with the statistics characteristics of the horizontal precipitate.

This shows that, when we know the shape of a speciﬁc

category in advance, more speciﬁc, and location statistics

characteristics, the prior knowledge could be encoded through

the covariance matrix cthat is involved in weight matrix Wc

to be fused into the loss function LCut . Here, we intuitively

give the inverse of matrix c, which can be directly used

in (12)

−1

c=⎡

⎣

τpτpτcxy 0

τpτcxy τp0

00τg⎤

⎦(13)

where τp=1/102and τg=1/162are weights for

position and color channel, and τcx y ={+2.5,0,−2.5}refers

to correlation degree, which could be selected according to the

opposite of covariance of each category (horizontal, vertical,

and longitudinal).

Objectively speaking, scholars have proposed many dif-

ferent loss functions for segmentation tasks. According to

the category in [56], our cut loss could be regarded as the

region-based loss compared with the pixelwise CE loss. It is

worth mentioning that the cut loss does not require pixel-level

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

Fig. 4. Weight matrix of the central pixel under different strategies.

annotations, which is an interesting characteristic. It can be

seamlessly integrated into weakly supervised learning [57].

We only need class-level annotations to indicate the shape. The

category information will be encoded in the weight matrix Wc,

which combines the original image information. Speciﬁcally,

by setting different covariance, we can specify the smoothness

preference to maintain the prediction shape of the category.

From the perspective of computational efﬁciency, unlike the

complex inference process in CRF based on energy function

minimization, our loss function can be easily realized through

gradient backward propagation in deep learning, as shown in

(11). Of course, it is still time-consuming to calculate the huge

weight matrix Wc. However, since the raw images and the

feature maps are reduced to a smaller scale after RoIAlign,

the computational power loss is acceptable.

In addition to the statistical characteristics of the position,

the contour is a more explicit and intuitive descriptor for

the shape of the object. When the pixel-level annotation is

available, it is meaningful to measure the predicted boundary

∂pand the boundary ∂gin the ground truth for transferring the

shape information. Here, we use the boundary loss proposed

in [47] as another loss function of the mask layer to improve

the performance. We directly show the ﬁnal nonsymmetric

L2 distance result after approximation as follows:

LB=



c=1

gc(φG·Sc)(14)

where φG∈RNis a distance vector that can be calculated in

advance with the same shape as Sc.

In (14), every element in φGrepresents the signed distance

between the current pixel and the nearest real boundary in the

ground truth. Speciﬁcally, if the current pixel is in the ground

truth, the symbol of the distance is negative. Otherwise, it is

positive. In other words, in order to minimize the boundary

loss LB, we need to maximize the predicted values for the

positive pixels and minimize the predicted values for the

negative ones at the same time. This is in line with perceptual

cognition. It should be noted that, unlike conventional loss

functions, such as cross entropy, its results may be negative

due to the approximation in the mathematical derivation that is

used for simpliﬁed calculation. However, this does not affect

its effectiveness in a deep network, which has been veriﬁed in

medical imaging [47]. In fact, its initial aim is to measure

the distance between two curves by integrating pixels on

the boundary. In view of differentiable requirements and the

limitation of computational power, the loss function is simpli-

ﬁed as the inner product of two vectors after approximation.

To some extent, it is equivalent to an L1-norm loss function

with pixelwise weight, which implies useful information about

boundaries and shapes.

At this point, we have completed the design of the seg-

mentation loss function of the mask layer in the proposed

prior mask R-CNN. In general, we set up three loss functions:

pixelwise cross-entropy loss LCE in (6), boundary loss LB

in (14) based on boundary (shape) measurement, and the

proposed cut loss LCut in (10) considered shape statistical

characteristics, as shown in Fig. 5.

Besides, in metallographic images, noise exists objectively

and can be divided into three categories: occlusion (inclination

fringes), interference (dislocation), and distortion (residual).

They are caused by the observation process, the specimen

itself, and the preparation, respectively. Among them, noise

caused by occlusion affects performance the most. More

seriously, considering the imaging principle of TEM, occlusion

(equal inclination or thickness fringes) is very common in

images. When the precipitates appear near the occlusions,

the model needs to be able to repair the occluded part.

From the perspective of noise suppression, our designed loss

function (see Fig. 5) could ﬁll in the occluded part effectively

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

Fig. 5. Loss function of the mask layer in prior mask R-CNN.

Algorithm 1 Postprocessing for Information Extraction

Input: mask list M=[m1,m2,...mp], category list C=

[c1,c2,...cp]

Output: precipitates length list L=[l1,l2,...lp]

Initialize list L=[]

For iin range ( p):

Extract miin mask list M,ciin category list C;

Get region instances riby region growing algorithm based

on mi

Filter out non-maximum areas in ri

Detect eight corner key points (p1,p2... p8)in ri

If ci== horizontal:

li=Mean[Dist(p1,p8), Dist(p2,p7), Dist(p3,p6),

Dist(p4,p5)]

elif ci== longitudinal:

li=Mean[Dist(p1,p4), Dist(p2,p3), Dist(p5,p8),

Dist(p6,p7)]

else:

li=Mean[Dist(p1,p5), Dist(p2,p6), Dist(p3,p7),

Dist(p4,p8)]

Append lito L

Return precipitates length list L

by introducing the prior knowledge of shape and contour,

which is a signiﬁcant improvement.

C. Postprocessing Module for Measurement

For the analysis of precipitates in this article, our ultimate

aim is to help material scientists measure the precipitates,

rather than solve a pure computer vision problem. The mea-

surement here refers to the statistical information of the three

kinds of precipitates, such as the distribution or mean value

of precipitates’ length, which is very important to reveal the

mechanical properties of the alloy. Therefore, from the practi-

cal point of view, we design a postprocessing module to extract

valuable information on the basis of instance segmentation.

Fig. 6. Flowchart of the postprocessing module.

Speciﬁcally, from the output of the computer vision network

to measurement acquisition, we mainly face two inevitable

problems. First, the segmentation results of network output

are not always connected, considering that it is obtained

by aggregating each pixel prediction independently. In other

words, a predicted mask may contain multiple isolated regions

that are treated as precipitates at the same time, which might be

caused by the visual noise in TEM images. Second, how to get

robust and reliable length information from irregular connected

domains is also a little problem that needs attention. In view

of the above problems, we designed a simple postprocessing

module based on region growing and key points’ detection for

measuring the precipitations, and the implementation details

are shown in Algorithm 1.

In brief, Algorithm 1 mainly includes a region-growing

algorithm, area ﬁlter, key points’ detection, and category-

wise length measurement. First, we use the seeded region-

growing algorithm [58] to get the region instances ri. Under

eight-neighbor pixels’ strategy, the selected seed points (pre-

dicted as foreground pixels) are grown to get the instance of

the connected domain. Next, nonmaximum areas are ﬁltered

out to eliminate the effect of noise and obtain the real precipi-

tates. Then, eight corner points are detected in turn by simple

maximum and minimum functions based on plane position.

We point out that these corners may coincide, considering

the irregular shape. Finally, we set up different distance

measurement rules according to the categories of precipitates,

which is consistent with the statement in materials science. For

example, for horizontal precipitation, the length is based on

the long side. In contrast, for vertical precipitation, the length

refers to its diameter. An example of a postprocessing module

is shown in Fig. 6.

III. RESULTS

In this section, we will show a variety of experiments to

test the effectiveness of the proposed prior mask R-CNN for

the detection of precipitates in alloys.

A. Dataset

In this article, our experimental object is direct chill cast

Al–12.7Si–0.7Mg alloy without further chemical modiﬁcation,

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

Fig. 7. Left: training samples (blue box) and test samples (red box) in one

slice. Right: overlapping samples (blue box) by sliding for augmentation.

which is widely used as structural materials. The dataset

contains 30 metallographic slices with the size of 2048 ×

2048 under different heat treatment conditions (such as dif-

ferent aging times and aging temperatures). It was observed

by transmission electron microscopy at a scale of 1 pixel:

0.15625 nm. For these metallographic images, we can ﬁnd

a series of precipitates based on Mg and Si growing in three

orthogonal directions. Speciﬁcally, we call them horizontal,

vertical, and longitudinal precipitates, which are labeled at

pixel level by a material expert and seven volunteers in our

team. Here, we point out that the terms “horizontal” (about

+30◦to the horizontal axis) and “longitudinal” (about −60◦

to the horizontal axis) in this article are not strict. They are

only used to distinguish each other. After annotation, we divide

each slice into four parts and distribute them to the training

set and the test set, respectively. As with many deep learning

projects, we augment our metallographic dataset to expand

the training samples used to train the network. It should be

noted that not all typical image augmentation methods and

afﬁne transformation are allowed in view of the clear material

science signiﬁcance of precipitates in metallographic images.

For example, image rescaling or rotation may lead to weird

precipitates that cannot be observed in practice, at least under

the current TEM settings. In contrast, overlapping cutting

in the slice is allowed, as shown in Fig. 7. Finally, after

data augmentation, our training set contains 300 (90 raw +

210 augmentation) images with the size of 1024 ×1024,

whereas the test set contains 30 images of the same size.

B. Instance Segmentation Results

First, we introduce some quantitative indicators used to

evaluate our method. For the instance segmentation task,

generally speaking, the performance of the model is evaluated

from two aspects: object detection and mask segmentation.

Among them, for object detection tasks, a common evaluation

index is the mean average precision (mAP). It is popular for

natural image tasks, such as MS COCO [24] and PASCAL

VOC challenge [59], as follows:

mAP =



c=1

APc=



c=1



TP=1

max

tp≥TPtp

tp +FP(tp)(15)

where Ncmeans the total number of instances for category c,

tp means the number of the true detected object, and FP(tp)is

TAB L E I

HYPERPARAMETERS INVOLVED IN OUR METHOD

a speciﬁc function that returns the number of the false detected

object; speciﬁcally, if the number of tp samples could be

detected, the minimum number of false positives is returned;

otherwise, return inﬁnity.

Next, we will do experiments to test the performance of the

proposed prior mask R-CNN. The speciﬁc hyperparameters of

this work are shown in Table I. Overall, we basically inherited

the typical settings in mmdetection [60]. For example, in each

training iteration, up to 256 anchors and 512 RoI are randomly

selected to guide RPN and RoI network learning, respectively.

The maximum number of proposal boxes from RPN is 1000,

and the NMS threshold for the positive sample is set to 0.7.

Considering the speciﬁcity of our task, some hyperparameters

need to be adjusted accordingly. In view of the small size

of vertical precipitates, we set the scale of the anchor to 4

to ensure that they can be fully detected. As for the learning

strategy, we use Adam [61] with a learning rate of 0.0003,

and the training epoch is set to 15. After the 15th training

epoch, 30 test images that did not appear in the training period

would be used to evaluate the performance. Instances with the

predicted probabilities higher than 0.3 will be considered valid,

and based on this, the quantitative index will be calculated

by (15). In order to further verify the effectiveness of the pro-

posed method, mask R-CNN [22], mask scoring R-CNN [29],

and cascade mask R-CNN [27] are also considered under the

same dataset as a comparison. The quantitative indicators are

showninTableII.

In Table II, we list the mean mAP and each category

AP at the same time, where the subscripts H, V, and L

refer to horizontal, vertical, and longitudinal precipitations.

The deﬁnition of AR is based on the same rule, and the

bold number of each column is the best performance of the

corresponding evolution index. Considering universality and

fairness, mAP is selected as the main index to comprehensively

evaluate methods.

On the whole, the performances for object detection (upper

of Table II) are generally better than that for mask segmenta-

tion (lower of Table II). This degradation could be understood,

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

TAB L E I I

PERFORMANCE OF OBJECT DETECTI ON (UPPER)AND MASK SEGMENTATION (LOWER )AMONG MASK R-CNN, MASK SCORING R-CNN, CASCADE

MASK R-CNN, AND THE PROPOSEDPRI OR MASK R-CNN

considering that mask segmentation is often based on the

results of object detection for the typical two-stage instance

segmentation framework. However, whether for object detec-

tion (upper of Table II) or mask segmentation task (lower of

Table II), the proposed prior mask R-CNN achieves better

performances in more evaluation indexes. Among them, for the

main index mAP of object detection, our algorithm achieves

the highest score of 0.475, which is ahead of 0.397 from mask

R-CNN, 0.447 from scoring R-CNN, and 0.378 from cascade

R-CNN. The situation of mask segmentation is basically the

same, and our method achieves the highest score of 0.298 for

the mask segmentation task. It is obvious that our method

should be more effective and appropriate for the detection and

segmentation of metallographic precipitates.

Besides, we observe signiﬁcant differences in performance

among different categories. For example, in the upper of

Table II, the minimum APVfor vertical precipitates of all

methods is 0.496 (mask R-CNN), whereas the maximum APH

for horizontal precipitates is only 0.356 (prior mask R-CNN).

This phenomenon is even more obvious in the mask segmen-

tation task. This is mainly due to the difﬁculty in predicting

the horizontal or longitudinal precipitates. In view of the

imaging principle of TEM, the horizontal and longitudinal

precipitates are often blurred with inexact contour compared

with the obvious dark gray vertical precipitates with a circular

shape, as shown in Fig. 1. In addition, for the prediction of

rectangle (needle) shape with a large length-to-width ratio of

horizontal and longitudinal precipitates, conventional convolu-

tion networks may encounter difﬁculties. It is worth noting that

this performance difference between categories is relatively

small for our proposed prior mask R-CNN. Fusing prior

knowledge into the deep network by our speciﬁc structure

(see Section II-A) and loss function (see Section II-B) might

alleviate the phenomenon. In addition, in order to compare

different methods more intuitively, we show prediction results

directly for the test set, just as in Fig. 8.

In Fig. 8, we selected three TEM metallographic images

to show the prediction results, which are realized by the

mmdetection toolbox [60]. The ﬁrst row in Fig. 8 is the overall

prediction results of different methods, whereas the second

and third rows focus more on the boundary box detection

effect and mask segmentation result, respectively. Speciﬁcally,

for the second row in Fig. 8, our method predicts more

precipitates with higher scores, such as yellow horizontal

precipitates. This implies that our method is sensitive to the

complex precipitates, which is consistent with the high recall

rate (mAR =0.586) in Table II. Unlike mask scoring R-CNN

and cascade mask R-CNN, which add extra subnetwork to

the topology structure of mask RCNN, our method only fuses

the size input through a simple skip connection (see Fig. 3).

This is helpful for scale-sensitive classiﬁcation problems, such

as precipitate detection in this article, so our method can

effectively detect more precipitates accurately. Besides, for the

third row in Fig. 8, our mask segmentation results are closest

to the shape of annotated precipitates in the ground truth. This

might be related to two additional segmentation loss functions

in prior mask R-CNN. To be more precise, the proposed cut

loss (10) that contains prior knowledge guides the network to

produce a smooth and consistent mask, by setting different

weight matrixes according to statistical characteristics. The

boundary loss (14) further ensures the rationality of the

predicted shape by measuring the contour distance between

prediction and ground truth. All these speciﬁc settings enable

our method to achieve better results.

Furthermore, we point out that the selection of hyperpa-

rameters in the model is ad hoc without using a validation

set. It implies that the hyperparameters in Table I may not

be optimal. The main criteria for selecting them are based

on the speciﬁc situation of our task. For example, we set the

“Anchor Scale” as 4 to ensure that the vertical precipitates

with an average size of 26 nm2could be detected effectively.

The selections of the correlation coefﬁcients τHxy =+2.5,

τVxy =0,and τLxy =−2.5 in the cut loss are based

on the statistical knowledge from the currently available

dataset. In addition, practicability is also an important cri-

terion. We changed the “Threshold for Test” from a typical

0.05 to 0.3. This correction leads to the degradation of the

mAP score (from 0.503 to 0.475) but effectively reduces

the false positive rate, which is more valuable for material

experts. Of course, the settings of all methods in Table II

are basically consistent, except for some inherent structures

or loss functions (e.g., additional scoring layer in Scoring

R-CNN [29]). Under the above conﬁguration, the performance

improvement of our prior mask R-CNN is relatively obvious,

just like Table II.

As mentioned above, different from the natural image

challenge, the evaluation index based on computer vision is not

the most important for the actual microstructure detection task.

From a practical point of view, our method should be able to

extract useful information from TEM images, which is helpful

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

Fig. 8. Prediction results between mask R-CNN, mask scoring R-CNN, cascade mask R-CNN, and the proposed prior mask R-CNN.

for material scientists to measure and analyze. Therefore,

in the following, we test the performance of the postprocessing

module for measurement proposed in this article. In order

to get the results more fairly, we selected three different

batches in the test set. These three batches are produced under

different heat treatments, speciﬁcally aging time, which is

meaningful to study the mechanical properties of the alloy.

That is to say, the difference between these test images is

even more obvious due to the different production processes

and inevitable changes in the environment. In addition, other

methods are also considered for comparison. The results are

shown in Table III.

In Table III, we show the average length of three kinds of

precipitates for three batches. H, V, and L refer to the hor-

izontal, vertical, and longitudinal precipitations, respectively.

The GT in the last row indicates the real annotated results by

experts, and the bold numbers in each column are the closest

results to the ground truth. Generally speaking, the post-

processing results based on prior mask R-CNN are more con-

sistent with the real results, no matter for the aging time of 1 or

12 h. This shows that the measurement results based on region

growing and key points’ detection (see Section II-C) can

accurately extract the material science information from the

network prediction results. It further proves the effectiveness

and robustness of our method. However, we must point out

that the accuracy under the current dataset is junior (maximum

error =5 nm), which is not enough for material analysis.

However, with more accurate and ﬁne-labeled metallographic

data, our method still has the potential to be used in the actual

production process.

It is worth mentioning that, in the actual production, the size

of the image may not be consistent with the image (1024 ×

1024) in this article. If we want to use the trained model to

predict different sizes of images directly, we need to manually

convert the predicted images into the same scale (1 pixel:

0.15625 nm). In other words, our model is more sensitive to

scale than to size. This may be related to the mechanism of the

convolutional network. Furthermore, by converting the scale,

more metallographic images can be used for training or testing.

Considering the value of TEM images, this is very meaningful

compared with directly discarding these data.

C. Ablation Study

For our proposed prior mask R-CNN, we make two

improvements to the basic mask R-CNN framework, from the

network structure and loss function to more accurately detect

the precipitates in the alloy. Speciﬁcally, in terms of structure,

we introduce size input in the classiﬁcation and regression

layer of the RoI network by skipping connection. For the loss

function, the weakly supervised loss function (10) based on

traditional graph cuts and the boundary loss function (14)

based on distance are used to segment in the mask layer

of the RoI network. These two improvements together make

our method achieve better performance, whether from the

perspective of computer vision or practical point, as shown

in Tables II and III. In this section, the speciﬁc role of these

two improvements will be analyzed in detail. Speciﬁcally,

under the same training settings and dataset, we make addi-

tional experiments on basic mask R-CNN with only structural

improvement and only loss function improvement for ablation

study. The ﬁnal object detection performance, mask segmenta-

tion performance, and prediction results are shown in Table IV

and Fig. 9, respectively.

The term “Speciﬁc Structure” in Table IV are corresponding

to the “Size Input Structure” in Fig. 9, which refers to the basic

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

TABLE III

AVERAGELENGTH OF PRECIP ITATES FOR THREE BAT CH ES (AGING TIME:1,3,AND 12 H)IN THE TEST SET AFTER INFORMATIONEXTRACTION MODULE

TAB L E I V

PERFORMANCE OF OBJ ECT DETECTION (UPPER)AND MASK SEGMENTATION (LOWER )OF BASIC MASK R-CNN, MASK R-CNN WITH SIZE INPUT

STRUCTURE,MASK R-CNN WITH ADDITI ONAL CUT AND BOUNDARY LOSS,AND COMPLETE PRIOR MASK R-CNN

Fig. 9. Prediction results between basic mask R-CNN, mask R-CNN with size input, mask R-CNN with cut and boundary loss, and complete prior mask

R-CNN.

mask R-CNN with additional size input. Similarly, “Speciﬁc

Loss” corresponds to “Cut&Boundrary Loss,” which indicates

the basic mask R-CNN with cut and boundary loss function.

The deﬁnition in bold is similar to the previous, that is, the

best performance for each evaluation index. First, from the

quantitative results in Table IV, it is obvious that both “Speciﬁc

Structure” and “Speciﬁc Loss” could improve the performance

of basic mask R-CNN. In the object detection task, from

the mAP column of the main evaluation index in upper of

Table IV, we ﬁnd that the improvement effect of “Speciﬁc

Structure” (from 0.397 to 0.446) is slightly better than that of

“Speciﬁc Loss” (from 0.397 to 0.445). On the contrary, the

improvement effect of “Speciﬁc Loss” (from 0.242 to 0.296)

is better than that of “Speciﬁc Structure” (from 0.242 to 0.264)

in the mask segmentation task based on the lower of Table IV.

This situation is similar to other indicators, such as mAR,

which is used to test the recall rate of the model.

This phenomenon should be consistent with the original

intention of the designed two improvements. Speciﬁcally, for

structural improvement, we input the size information to the

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

classiﬁcation and regression layer at the object detection stage,

after observing the obvious difference in the size distribution

of different types of precipitates (see Fig. 3). This is helpful

for size-sensitive microstructure classiﬁcation tasks. It leads

to the improvement of the performance of object detection

after the structural improvement of the basic mask R-CNN.

Besides, loss improvement is mainly designed for the mask

segmentation task. Based on the predictable shape of different

precipitates, we set the cut loss to produce a smooth and proper

prediction result with category-related preference. In addition,

because the model is based on the typical two-stage instance

segmentation framework, no matter for RPN or the classi-

ﬁcation layer, the regression layer, and the mask layer in

the RoI network, their inputs are from the same backbone

convolution network (ResNet50 +FPN in this article). That

is to say, the improvement of any branch may be linked.

This also explains why the results in Table IV tend to show

methodological relevance. For example, an additional cut loss

set in the mask layer is also signiﬁcant for the improvement

of the object detection task, except for the deserved original

segmentation task.

Finally, we test the effect of the proposed cut loss, which

is an important contribution of this article. In short, the cut

loss function is a kind of segmentation loss function, which is

inspired by the graph cuts theory. Compared with pixel-level

annotation used in the cross entropy, the cut loss only needs

class-level annotation and corresponding statistical character-

istics, which is also adaptive to weakly supervised learning.

The statistical characteristics of different categories could be

regarded as prior knowledge. When objects with the same

category appear in desired and predictable shapes, our loss

function beneﬁted from prior knowledge is helpful for the

corresponding segmentation task. We note that the prior

knowledge is integrated into the loss function by setting the

corresponding weight matrix of categories. In the following,

in order to further test the effect of prior knowledge and cut

loss, we set three different weight matrices for longitudinal

precipitates by selecting different correlation degrees τLxy =

(−2,−2.5,−3)in (13). The ﬁnal prediction results are shown

in Fig. 10.

In Fig. 10, the ﬁrst column shows the overall prediction

results under the three correlation degrees, followed by the

results of the longitudinal precipitates and the enlarged view.

Intuitively, the outputs of the network are quite different.

Speciﬁcally, when τLxy =−2, the prediction shapes of

longitudinal precipitates are relatively blunt. However, when

we set τLxy =−3, the predicted shapes of that become sharp.

This shows that, with constant learning, the loss function could

control the shape of the prediction result. At the same time,

it also implies that the iterative method based on gradient

descent is effective to optimize the objective function of nor-

malized graph cuts to a certain extent. That is to say, by setting

the weight matrix that involves prior knowledge in cut loss,

we can control the shape of predicted segmentation. Different

from the cross-entropy loss based on the pixel level, this loss

function is more consistent with human visual perception. The

relevant prior knowledge is naturally integrated into the end-

to-end training period of the deep network without additional

Fig. 10. Prediction results for longitudinal precipitations under different

correlation degrees [τLxy =(−2,−2.5,−3))].

postprocessing modules or complex inference processes. More

importantly, the cut loss can be employed for any conventional

image segmentation network, besides the instance segmenta-

tion framework in this article. The cut loss function may be

an appropriate complement to the cross-entropy loss when the

shapes of categories are statistically signiﬁcant.

D. Limitations

In general, based on deep learning, we proposed a novel

framework for the measurement of precipitates. There may be

some possible limitations in this study. First, from the per-

spective of experimental materials, our metallographic dataset

has only 300 nanometer-level TEM images. To be honest, it is

relatively small compared to the popular natural image dataset,

such as MS COCO (≈330k). Considering the high cost

of specimen preparation and expert annotation, it is difﬁcult

to obtain a large number of metallographic images. How

to achieve better performance in the current small dataset

is worth thinking about. Second, from the perspective of a

deep network, strictly speaking, the tuning of hyperparameters

depends on the performance of the validation set. However,

in our work, the selection of hyperparameters in the model

is ad hoc without using such a validation set. As a result,

these results may not be fully generalized. In fact, ﬁnding the

proper hyperparameters might be very difﬁcult, especially for

the model that contains many hyperparameters to be set in

this work. Finally, with the introduction of the proposed cut

loss (10), the computational efﬁciency will inevitably decrease.

Of course, this degradation (from about 0.55 to 0.75 s/image)

is basically acceptable. In future work, in order to solve the

problem of generalization, some learning strategies are worthy

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021

of attention, such as few-shot learning and transfer learning.

As for the selection of hyperparameters, automatic machine

learning [62] seems to be a good solution. These may be the

key to the practical application of our automatic measurement

methods in materials science.

IV. CONCLUSION

In this article, we proposed a novel framework for the

measurement of precipitates in aluminum alloys. It is a

two-stage instance segmentation network, which is based on

mask R-CNN and consists of the backbone network, RPN,

and the RoI network. For the RoI network, considering that

the size distributions of different precipitate categories have

obvious differences, we input the size information based on

boundary box area into the classiﬁcation layer and regression

layer of the RoI network through a simple skip connection.

Besides, since the shape of precipitates is predictable, the

proposed cut loss function, including prior knowledge, and

the boundary loss function for measuring contour distance

are designed to segment the mask in the mask layer. In fact,

our framework improves the basic mask R-CNN in terms of

topological structure and loss, respectively, based on the prior

knowledge (size and shape) of the category. As a result, we call

the proposed framework prior mask R-CNN. From a practical

point of view, we design a simple postprocessing module

to extract material information based on the region-growing

algorithm and key points’ detection. As for the experiments,

our method achieves an mAP score of 0.475 in the object

detection task and an mAP score of 0.298 in the mask

segmentation task, which surpasses other comparison methods.

In addition, the length information of precipitates obtained

from the output of our network is more consistent with that

annotated by experts. This should be attributed to the designed

structure and loss function of our method. In the ablation study,

we tested these designs separately and explored the relevance

of the proposed cut loss to the predicted shape. In summary,

when the shapes and sizes of the objects are predictable in

advance, our framework named prior mask R-CNN provides

a new idea to improve performance for automatic measurement

by fusing prior knowledge.

REFERENCES

[1] L. P. Troeger and E. A. Starke, “Microstructural and mechanical char-

acterization of a superplastic 6xxx aluminum alloy,” Mater. Sci. Eng.,

A, vol. 277, nos. 1–2, pp. 102–113, Jan. 2000.

[2] T. Hemalatha, S. Akilandeswari, T. Krishnakumar, S. G. Leonardi,

G. Neri, and N. Donato, “Comparison of electrical and sensing proper-

ties of pure, Sn- and Zn-doped CuO gas sensors,” IEEE Trans. Instrum.

Meas., vol. 68, no. 3, pp. 903–912, Mar. 2019.

[3] F. Liu, F. Yu, D. Zhao, and L. Zuo, “Microstructure and mechanical

properties of an Al-12.7Si-0.7Mg alloy processed by extrusion and heat

treatment,” Mater. Sci. Eng. A., vol. 528, pp. 3786–3790, Apr. 2011.

[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for

large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,

2015, pp. 1–14.

[5] Y. Liu et al., “Richer convolutional features for edge detection,” IEEE

Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1939–1946,

Aug. 2019.

[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look

once: Uniﬁed, real-time object detection,” in Proc. IEEE Conf. Comput.

Vis . Patt er n R ecog. , Jun. 2016, pp. 779–788.

[7] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep

convolutional encoder-decoder architecture for image segmentation,”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,

Dec. 2017.

[8] S. M. Azimi, D. Britz, M. Engstler, M. Fritz, and F. Mücklich,

“Advanced steel microstructural classiﬁcation by deep learning meth-

ods,” Sci. Rep., vol. 8, no. 1, pp. 1–14, Dec. 2018.

[9] C. Wang, D. Shi, and S. Li, “A study on establishing a microstructure-

related hardness model with precipitate segmentation using deep learning

method,” Materials, vol. 13, no. 5, p. 1256, Mar. 2020.

[10] M. Li, D. Chen, S. Liu, and F. Liu, “Grain boundary detection

and second phase segmentation based on multi-task learning and

generative adversarial network,” Measurement, vol. 162, Oct. 2020,

Art. no. 107857.

[11] K. Gajalakshmi, S. Palanivel, N. J. Nalini, S. Saravanan, and

K. Raghukandan, “Grain size measurement in optical microstruc-

ture using support vector regression,” Optik, vol. 138, pp. 320–327,

Jun. 2017.

[12] O. Dengiz, A. E. Smith, and I. Nettleship, “Grain boundary detection in

microstructure images using computational intelligence,” Comput. Ind.,

vol. 56, nos. 8–9, pp. 854–866, Dec. 2005.

[13] X. Zhenying, Z. Jiandong, Z. Qi, and P. Yamba, “Algorithm based

on regional separation for automatic grain boundary extraction using

improved mean shift method,” Surf. Topography, Metrology Properties,

vol. 6, no. 2, Apr. 2018, Art. no. 025001.

[14] H. Peregrina-Barreto, I. R. Terol-Villalobos, J. J. Rangel-Magdaleno ,

A. M. Herrera-Navarro, L. A. Morales-Hernández, and

F. Manríquez-Guerrero, “Automatic grain size determination in

microstructures using image processing,” Measurement, vol. 46,

no. 1, pp. 249–258, Jan. 2013.

[15] B. Lu, M. Cui, Q. Liu, and Y. Wang, “Automated grain boundary

detection using the level set method,” Comput. Geosci., vol. 35, no. 2,

pp. 267–275, Feb. 2009.

[16] B. Ma et al., “Fast-FineCut: Grain boundary detection in micro-

scopic images considering 3D information,” Micron, vol. 116, pp. 5–14,

Jan. 2019.

[17] C. A. Paredes-Orta, J. D. Mendiola-Santibañez, F. Manriquez-Guerrero,

and I. R. Terol-Villalobos, “Method for grain size determination in

carbon steels based on the ultimate opening,” Measurement, vol. 133,

pp. 193–207, Feb. 2019.

[18] L. Liu et al., “Deep learning for generic object detection: A survey,”

Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, Jan. 2020.

[19] B. Wang et al., “Automatic fault diagnosis of infrared insulator images

based on image instance segmentation and temperature analysis,” IEEE

Trans. Instrum. Meas., vol. 69, no. 8, pp. 5345–5355, Aug. 2020.

[20] J. Ma, K. Qian, X. Zhang, and X. Ma, “Weakly supervised instance

segmentation of electrical equipment based on RGB-T automatic anno-

tation,” IEEE Trans. Instrum. Meas., vol. 69, no. 12, pp. 9720–9731,

Dec. 2020.

[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature

hierarchies for accurate object detection and semantic segmentation,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2014, pp. 580–587.

[22] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc.

IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969.

[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards

real-time object detection with region proposal networks,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.

[24] T.-Y. Lin et al., “Microsoft coco: Common objects in context,” in Proc.

Eur. Conf. Comput. Vis., 2014, pp. 740–755.

[25] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks

for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,

vol. 39, no. 4, pp. 640–651, 2017.

[26] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for

instance segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern

Recognit., Jun. 2018, pp. 8759–8768.

[27] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object

detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.

[28] K. Chen et al., “Hybrid task cascade for instance segmentation,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,

pp. 4974–4983.

[29] Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring

R-CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2019, pp. 6409–6418.

[30] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.

Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015

[31] Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris,

“SpotTune: Transfer learning through adaptive ﬁne-tuning,” in Proc.

IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,

pp. 4805–4814.

[32] Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim,

“Image to image translation for domain adaptation,” in Proc. IEEE/CVF

Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4500–4509 .

[33] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few

examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53,

no. 3, pp. 1–34, Jul. 2020.

[34] Z. Mirikharaji and G. Hamarneh, “Star shape prior in fully convolu-

tional networks for skin lesion segmentation,” in Proc. MICCAI, 2018,

pp. 737–745.

[35] S. Y. Han, H. J. Kwon, Y. Kim, and N. I. Cho, “Noise-robust pupil center

detection through CNN-based segmentation with shape-prior loss,” IEEE

Access, vol. 8, pp. 64739–64749, 2020.

[36] C. Zotti, Z. Luo, O. Humbert, A. Lalande, and P. M. Jodoin, “GridNet

with automatic shape prior registration for automatic MRI cardiac

segmentation,” in Proc. STACOM-MICCAI, 2017, pp. 73–81.

[37] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,

“DeepLab: Semantic image segmentation with deep convolutional nets,

atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern

Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.

[38] S. Zheng et al., “Conditional random ﬁelds as recurrent neural net-

works,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,

pp. 1529–1537.

[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for

image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2016, pp. 770–778.

[40] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4,

Inception-ResNet and the impact of residual connections on learning,”

in Proc. AAAI Conf. Artif. Intell., 2016, pp. 4278–4284.

[41] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-

works for biomedical image segmentation,” in Medical Image Comput-

ing and Computer-Assisted Intervention—MICCAI, 2015, pp. 234–241.

[42] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,

“Feature pyramid networks for object detection,” in Proc. IEEE Conf.

Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.

[43] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance

segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),

Oct. 2019, pp. 9157–9166.

[44] X. Chen, R. Girshick, K. He, and P. Dollár, “TensorMask: A foundation

for dense object segmentation,” in Proc. IEEE/CVF Int. Conf. Comput.

Vis. (ICCV), Oct. 2019, pp. 2061–2069.

[45] Y. Zhang and Q. Yang, “A survey on multi-task learning,” 2017,

arXiv:1707.08114. [Online]. Available: http://arxiv.org/abs/1707.08114

[46] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE

Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1395–1403.

[47] H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and

I. B. Ayed, “Boundary loss for highly unbalanced segmentation,” in

Proc. Int. Conf. Med. Imag. Deep Learn, 2019, pp. 285–296 .

[48] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional

neural networks for volumetric medical image segmentation,” in Proc.

4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 565–571.

[49] R. Zhang et al., “Automatic segmentation of acute ischemic stroke

from DWI using 3-D fully convolutional DenseNets,” IEEE Trans. Med.

Imag., vol. 37, no. 9, pp. 2149–2160, Sep. 2018.

[50] D. Hao et al., “Sequential vessel segmentation via deep channel attention

network,” Neural Netw., vol. 128, pp. 172–187, Aug. 2020.

[51] D. Karimi and S. E. Salcudean, “Reducing the Hausdorff distance in

medical image segmentation with convolutional neural networks,” IEEE

Trans. Med. Imag., vol. 39, no. 2, pp. 499–513, Feb. 2020.

[52] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time

style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis.,

2016, pp. 694–711.

[53] J. Shi and J. Malik, “Normalized cuts and image segmentation,”

IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,

Aug. 2000.

[54] M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers,

“Normalized cut loss for weakly-supervised CNN segmentation,” in

Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,

pp. 1818–1827.

[55] P. Krähenbühl and V. Koltun, “Efﬁcient inference in fully connected

CRFs with Gaussian edge potentials,” in Proc. Adv. Neural Inf. Process.

Syst., 2011, pp. 109–117.

[56] J. Ma, “Segmentation loss odyssey,” 2020, arXiv:2005.13449. [Online].

Available: http://arxiv.org/abs/2005.13449

[57] Z.-H. Zhou, “A brief introduction to weakly supervised learning,” Nat.

Sci. Rev., vol. 5, no. 1, pp. 44–53, Jan. 2018.

[58] R. Adams and L. Bischof, “Seeded region growing,” IEEE Trans. Pattern

Anal. Mach. Intell., vol. 16, no. 6, pp. 641–647, Jun. 1994.

[59] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams,

J. Winn, and A. Zisserman, “The Pascal visual object classes challenge:

A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136,

Jan. 2015.

[60] K. Chen et al., “MMDetection: Open MMLab detection tool-

box and benchmark,” 2019, arXiv:1906.07155. [Online]. Available:

http://arxiv.org/abs/1906.07155

[61] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–41.

[62] X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-the-art,”

Knowl.-Based Syst., vol. 212, Jan. 2021, Art. no. 106622.

Mingchun Li received the B.S. and M.S. degrees in

automation from Northeastern University, Shenyang,

China, in 2015 and 2018, respectively, where he is

currently pursuing the Ph.D. degree with the College

of Information Science and Engineering.

His research lies at the intersection of machine

learning and image processing. His current research

work is about medical signals and industrial intelli-

gence based on deep learning.

Dali Chen received the B.S., M.S., and Ph.D.

degrees in automation, pattern recognition, and

intelligent systems from Northeastern University,

Shenyang, China, in 2003, 2005, and 2008, respec-

tively.

He is currently an Associate Professor with the

College of Information Science and Engineering,

Northeastern University. His research lies at the

intersection of machine learning and image process-

ing. His current research interest is to develop deep

learning algorithms for medical image processing

and industrial intelligent systems.

Shixin Liu (Member, IEEE) received the B.S.

degree in mechanical engineering from Southwest

Jiaotong University, Sichuan, China, in 1990, and

the M.S. and Ph.D. degrees in systems engineer-

ing from Northeastern University, Shenyang, China,

in 1993 and 2000, respectively.

He is currently a Professor with the College of

Information Science and Engineering, Northeastern

University. He has authored or coauthored over

100 publications, including one book. His research

interests are in intelligent optimization algorithms,

planning and scheduling, machine learning, and computer vision.

Fang Liu received the B.S. and Ph.D. degrees

in materials science from Northeastern University,

Shenyang, China, in 2004 and 2013, respectively.

She is currently a Lecturer with the School

of Materials Science and Engineering, Northeast-

ern University. Her current research interests are

wrought aluminum–silicon alloy and alloy design

based on ﬁnely dispersed second-phase particles

strengthening matrix.

Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.

Merged-Sampling Mask R-CNN with Random Proposal Expansion for Particle Measurement of SEM Images of Molecular Sieve Catalysts

Article

Nov 2021

Scanning electron microscope (SEM) images of molecular sieve catalysts contain information about the shapes and sizes of these particles. SEM image measurement is a crucial step in the evaluation of the catalytic performances. Instance segmentation methods, such as Mask R-CNN are effective in automatically analyzing SEM images. However, their performance is limited in small datasets. Although overfitting caused by small data sets can be addressed through data augmentation at the image level, the application of Mask R-CNN still needs further improvement for generalization enhancement. In this paper, two techniques for Mask R-CNN are proposed to alleviate overfitting during the training on small datasets. First, merged sampling on the region proposal network simultaneously samples proposals with high and low scores in order that the head networks can be exposed to more diverse proposal samples. Second, random proposal expansion enhances the diversity of samples provided to the mask-branch of Mask R-CNN. These two techniques can be viewed as data augmentation at the instance level. Experiments on a small SEM image dataset showed that merged-sampling Mask R-CNN with random proposal expansion improved about the average precision (AP) by 5%, compared with the original Mask R-CNN. Overfitting on the small dataset is effectively controlled using the proposed methods. Hence, the results of the particle measurements of the industrial SEM images improved.

3D Pose Recognition of Small Special-Shaped Sheet Metal with Multi-Objective Overlapping

Article

Full-text available

Jun 2023

This paper addresses the challenging task of determining the position and posture of small-scale thin metal parts with multi-objective overlapping. To tackle this problem, we propose a method that utilizes instance segmentation and a three-dimensional (3D) point cloud for recognizing the posture of thin special-shaped metal parts. We investigate the process of obtaining a single target point cloud by aligning the target mask with the depth map. Additionally, we explore a pose estimation method that involves registering the target point cloud with the model point cloud, designing a registration algorithm that combines the sample consensus initial alignment algorithm (SAC-IA) for coarse registration and the iterative closest point (ICP) algorithm for fine registration. The experimental results demonstrate the effectiveness of our approach. The average accuracy of the instance segmentation models, utilizing ResNet50 + FPN and ResNet101 + FPN as backbone networks, exceeds 97%. The time consumption of the ResNet50 + FPN model is reduced by 50%. Furthermore, the registration algorithm, which combines the SAC-IA and ICP, achieves a lower average consumption time while satisfying the requirements for the manufacturing of new energy batteries.

Computer Vision Methods for the Microstructural Analysis of Materials: The State-of-the-art and Future Perspectives

Preprint

Full-text available

Jul 2022

Finding quantitative descriptors representing the microstructural features of a given material is an ongoing research area in the paradigm of Materials-by-Design. Historically, microstructural analysis mostly relies on qualitative descriptions. However, to build a robust and accurate process-structure-properties relationship, which is required for designing new advanced high-performance materials, the extraction of quantitative and meaningful statistical data from the microstructural analysis is a critical step. In recent years, computer vision (CV) methods, especially those which are centered around convolutional neural network (CNN) algorithms have shown promising results for this purpose. This review paper focuses on the state-of-the-art CNN-based techniques that have been applied to various multi-scale microstructural image analysis tasks, including classification, object detection, segmentation, feature extraction, and reconstruction. Additionally, we identified the main challenges with regard to the application of these methods to materials science research. Finally, we discussed some possible future directions of research in this area. In particular, we emphasized the application of transformer-based models and their capabilities to improve the microstructural analysis of materials.

Weakly Supervised Segmentation Loss Based on Graph Cuts and Superpixel Algorithm

Article

Full-text available

Jun 2022
NEURAL PROCESS LETT

In recent years, weakly supervised learning is a hot topic in the field of machine learning, especially for image segmentation. Assuming that only a small number of pixel categories are known in advance, it is worth thinking about how to achieve appropriate deep network. In this work, a series of weakly supervised segmentation losses based on graph cuts are proposed to solve this problem. Specifically, we take the objective function of the classical graph cut algorithm as the loss function of deep learning and integrate it into the back gradient propagation to update the parameters of network. Follow this route, typical region-based losses, such as IoU loss and Dice loss, could also be extended to weakly supervised versions in this work. Besides, considering the computational complexity of the pixel level graph cut algorithm, we use SLIC algorithm to extract superpixels as the vertices of the graph involved in our loss function. In the experiments, the network based on the proposed graph cut loss achieves good performance in VOC2012 dataset, which proves the effectiveness of our method.

Semisupervised Boundary Detection for Aluminum Grains Combined With Transfer Learning and Region Growing

Article

Full-text available

Dec 2021

In the manufacturing process of aluminum alloy, the size, distribution, and shape of microscopic grains indicate the mechanical characteristics and product quality. However, for metallographic images that can reveal microstructures, the cost of expert labeling at pixel level is high. To solve the problem, we propose a semisupervised learning strategy for grain boundary detection with a few labeled images and abundant unlabeled samples. To expand the helpful information, transfer learning and rule-based region growing are considered. Specifically, a deep network used for extracting multiscale features is designed. With constant training, through a few labeled metallographic images and abundant transferred natural images, pseudo annotations are generated gradually for unlabeled metallographic images iteratively by feature similarity and boundary region growing. The increased unlabeled samples with their pseudo annotations would be involved in the following training process in semisupervised self-training mode to improve the generalization ability of model, together with the domain adaptation block. In experiments, the proposed two methods named semiricher convolutional features-generative adversarial networks (SemiRCF-GAN) and semiricher convolutional features-maximum mean discrepancy (SemiRCF-MMD) can effectively detect grain boundaries with only one labeled metallographic image, and achieve F1 scores of 0.73 and 0.72, respectively, which surpass typical methods.

Multiscale Attention-Based Instance Segmentation for Measuring Crystals With Large Size Variation

Article

Jan 2023

Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segmentation errors, such as incorrectly fused or separated segments, can significantly lower the accuracy of the measured results. Instead of improving the existing pixel-wise boundary segmentation methods, we propose to use an instance-based segmentation method, which gives more robust segmentation results to improve measurement accuracy. Our novel method enhances flow maps with a size-aware multi-scale attention module. The attention module adaptively fuses information from multiple scales and focuses on the most relevant scale for each segmented image area. We demonstrate that our proposed attention fusion strategy outperforms state-of-the-art instance and boundary segmentation methods, as well as simple average fusion of multi-scale predictions. We evaluate our method on a refractory raw material dataset of high-resolution images with large variation in crystal size and show that our model can be used to calculate the crystal size more accurately than existing methods.

Center Guided and Connectivity-Preserving Network for Grain Size Measurement

Article

Dec 2021

Grain size plays a fundamental role in the mechanical properties of materials. Recently, automatic measurement of average grain size attacks more and more attention based on computer vision. However, low contrast, twin grains, thin boundary, and low connectivity limit the achievement of automatic and accurate image analysis. Inspired by the calculation procedure of grain size, we propose a center-guided and connectivity-preserving network for grain boundaries segmentation. On the one hand, the proposed center feature extraction module and center-guided feature recalibration mechanism make the network pay more attention to the center area. On the other hand, a connectivity-preserving loss function is integrated with the network, which forces the network to converge towards high connectivity. Benefit by above aspects, our network can segment the grain boundary with high structural integrity and avoid the complex post-processing process. Experiments on the SRIF-GSM dataset reveal that our method achieves 85.98 mIoU and 95.60 clDice scores, demonstrating significant advantages compared with the state-of-the-art semantic segmentation methods.

Load data recovery method based on SOM-LSTM neural network

Article

Nov 2021

In the collection and transmission of power big data, the problem of data missing exists. In response to this problem, this paper proposes a power data detection and repair method based on SOM-LSTM. Firstly, a large amount of collected power data is analyzed and the type of missing data is determined. Then, the SOM is used to classify the power data. The LSTM is trained according to the characteristic values of different users to complete the detection and repair of different types of missing power data. Finally, the analysis is based on actual data in some regional loads of China. Experimental results show that, compared with the extreme learning machine (ELM) and LSTM, the proposed SOM-LSTM model reduces the mean absolute error (MAE) by 0.2498 and 0.3425, and the root mean square error (RMSE) by 0.1048 and 0.1469, respectively.

A Lightweight Adaptive RoI Extraction Network for Precise Aerial Image Instance Segmentation

Article

Oct 2021

Bounding boxes have been widely implemented into aerial object detection for its simplicity. They perform instance-level location with the coordinates and orientation for each target. But the defects such as coarse edge information impede semantic interpretation in earth observation. Besides, in terms of the aerial imaging instruments, it’s essential to recognize the exterior appearance and contour of the objects. In this paper, we propose a novel aerial instance segmentation method termed adaptive RoI extraction network (ARE-Net) which bridges the gap of accurately delineating instances under the complex back-ground of aerial images. To exert instance segmentation under the proprietary property, e.g., complex background, densely distributed instances, of aerial images, RoIs are pooled from multi-level feature maps and integral region proposals. On this basis, global attention RoI extractor (GA-RoIE) and perceptual RoI extractor (PRoIE) are respectively introduced for detection branch and mask branch to perform adaptive RoI extraction for aerial images. Meanwhile, to reconcile the probability distribution regional distribution of pixel-wise prediction in aerial images, we present the Adaptive Compound loss function to improve the integrating degree of the predicted binary mask to ground truth mask. Additionally, we adopt RegNetx with Deformable Convolution to optimize ARE-Net, and name it as R-ARE-Net. Despite implementing pixel-wise prediction, comprehensive experiments on iSAID and NWPU VHR-10 instance segmentation dataset still have verified the effectiveness and efficiency of ARE-Net and R-ARE-Net. Experimental results indicate that our proposed methods receive the highest AP value (38.0% AP on iSAID and 64.2% AP on NWPU VHR-10 instance segmentation dataset) and lowest FLOPs and Parameters consumption (~46% reduced FLOPs and 61.5% reduced Parameters than SCNet) among the mainstream methods. Besides, the false alarms, missing segmentations, poorly predicted masks, and under-segmentations that appeared in the mainstream methods can be avoided to some extend for R-ARE-Net.

Boundary loss for highly unbalanced segmentation

Article

Full-text available

Jan 2021
MED IMAGE ANAL

Widely used loss functions for CNN segmentation, e.g., Dice or cross-entropy, are based on integrals over the segmentation regions. Unfortunately, for highly unbalanced segmentations, such regional summations have values that differ by several orders of magnitude across classes, which affects training performance and stability. We propose a boundary loss, which takes the form of a distance metric on the space of contours, not regions. This can mitigate the difficulties of highly unbalanced problems because it uses integrals over the interface between regions instead of unbalanced integrals over the regions. Furthermore, a boundary loss complements regional information. Inspired by graph-based optimization techniques for computing active-contour flows, we express a non-symmetric L2 distance on the space of contours as a regional integral, which avoids completely local differential computations involving contour points. This yields a boundary loss expressed with the regional softmax probability outputs of the network, which can be easily combined with standard regional losses and implemented with any existing deep network architecture for N-D segmentation. We report comprehensive evaluations and comparisons on different unbalanced problems, showing that our boundary loss can yield significant increases in performances while improving training stability. Our code is publicly available¹.

Generalizing from a Few Examples: A Survey on Few-shot Learning

Article

Full-text available

Jun 2020

Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.

Weakly Supervised Instance Segmentation of Electrical Equipment Based on RGB-T Automatic Annotation

Article

Full-text available

Jun 2020

To address the problem of weakly supervised instance segmentation for electrical equipment using RGB camera only, an automatic annotation of masks of samples (AAMS) method based on thermal image guidance is proposed in this paper. With image-level label supervision only, we exploit foreground segmentation results of thermal images to guide the instance mask extraction of electrical equipment in RGB images through the heterogeneous pixel registration algorithm between RGB-T image pairs. It is realized to automatically annotate instance masks, which greatly improves efficiency and decreases costs. In addition, we further propose a progressively optimized model (POM) for instance segmentation, which first utilizes the fully-connected conditional random field (CRF) and the constrain-to-boundary loss to specify fine-detailed boundaries of each object and to solve the difficulty of segmenting electrical equipment with complicated structures. This model also explores the self-paced learning technology to solve the issue of resolution differences between RGB-T image pairs for improving the generalization ability. By comparison to the other state-of-the-arts, experimental results show that our method can obtain by far the better performance on the electrical equipment dataset.

Sequential vessel segmentation via deep channel attention network

Article

Full-text available

May 2020
NEURAL NETWORKS

Accurately segmenting contrast-filled vessels from X-ray coronary angiography (XCA) image sequence is an essential step for the diagnose and therapy of coronary artery disease. However, developing automatic vessel segmentation is particularly challenging due to the overlapping structures, low contrast and the presence of complex and dynamic background artifacts in XCA images. This paper develops a novel encoder–decoder deep network architecture which exploits the several contextual frames of 2D+t sequential images in a sliding window centered at current frame to segment 2D vessel masks from the current frame. The architecture is equipped with temporal-spatial feature extraction in encoder stage, feature fusion in skip connection layers and channel attention mechanism in decoder stage. In the encoder stage, a series of 3D convolutional layers are employed to hierarchically extract temporal-spatial features. Skip connection layers subsequently fuse the temporal-spatial feature maps and deliver them to the corresponding decoder stages. To efficiently discriminate vessel features from the complex and noisy backgrounds in the XCA images, the decoder stage effectively utilizes channel attention blocks to refine the intermediate feature maps from skip connection layers for subsequently decoding the refined features in 2D ways to produce the segmented vessel masks. Furthermore, Dice loss function is implemented to train the proposed deep network in order to tackle the class imbalance problem in the XCA data due to the wide distribution of complex background artifacts. Extensive experiments by comparing our method with other state-of-the-art algorithms demonstrate the proposed method’s superior performance over other methods in terms of the quantitative metrics and visual validation. To facilitate the reproductive research in XCA community, we publically release our dataset and source codes at https://github.com/Binjie-Qin/SVS-net.

Noise-Robust Pupil Center Detection through CNN-Based Segmentation with Shape-Prior Loss

Article

Full-text available

Apr 2020

Detecting the pupil center plays a key role in human-computer interaction, especially for gaze tracking. The conventional deep learning-based method for this problem is to train a convolutional neural network (CNN), which takes the eye image as the input and gives the pupil center as a regression result. In this paper, we propose an indirect use of the CNN for the task, which first segments the pupil region by a CNN as a classification problem, and then finds the center of the segmented region. This is based on the observation that CNN works more robustly for the pupil segmentation than for the pupil center-point regression when the inputs are noisy IR images. Specifically, we use the UNet model for the segmentation of pupil regions in IR images and then find the pupil center as the center of mass of the segment. In designing the loss function for the segmentation, we propose a new loss term that encodes the convex shape-prior for enhancing the robustness to noise. Precisely, we penalize not only the deviation of each predicted pixel from the ground truth label but also the non-convex shape of pupils caused by the noise and reflection. For the training, we make a new dataset of 111,581 images with hand-labeled pupil regions from 29 IR eye video sequences.We also label commonly used datasets (ExCuSe and ElSe dataset) that are considered real-world noisy ones to validate our method. Experiments show that the proposed method performs better than the conventional methods that directly find the pupil center as a regression result.

A Study on Establishing a Microstructure-Related Hardness Model with Precipitate Segmentation Using Deep Learning Method

Article

Full-text available

Mar 2020

This paper established a microstructure-related hardness model of a polycrystalline Ni-based superalloy GH4720Li, and the sizes and area fractions of γ’ precipitates were extracted from scanning electron microscope (SEM) images using a deep learning method. The common method used to obtain morphological parameters of γ’ precipitates is the thresholding method. However, this method is not suitable for distinguishing different generations of γ’ precipitates with similar gray values in SEM images, which needs many manual interventions. In this paper, we employ SEM with ATLAS (AuTomated Large Area Scanning) module to automatically and quickly detect a much wider range of microstructures. A deep learning method of U-Net is firstly applied to automatically and accurately segment different generations of γ’ precipitates and extract their parameters from the large-area SEM images. Then the obtained sizes and area fractions of γ’ precipitates are used to study the precipitate stability and microstructure-related hardness of GH4720Li alloy at long-term service temperatures. The experimental results show that primary and secondary γ’ precipitates show good stability under long-term service temperatures. Tertiary γ’ precipitates coarsen selectively, and their coarsening behavior can be predicted by the Lifshitz–Slyozov encounter modified (LSEM) model. The hardness decreases as a result of γ’ coarsening. A microstructure-related hardness model for correlating the hardness of the γ’/γ coherent structures and the microstructure is established, which can effectively predict the hardness of the alloy with different microstructures.

AutoML: A survey of the state-of-the-art

Article

Jan 2021
KNOWL-BASED SYST

Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods-covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS)-with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms' performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.

Normalized cuts and image segmentation

Conference Paper

Jan 1997

Grain Boundary Detection and Second Phase Segmentation Based on Multi-task Learning and Generative Adversarial Network

Article

Apr 2020
MEASUREMENT

The size, shape and distribution of microstructures (second phase particles, grains) play an important role in the mechanical properties of alloy products. So, it is important to detect grains and second phase particles precisely. In this paper, we use multi-task learning and generative adversarial network (GAN) to realize the segmentation of the second phase and the boundary detection of grains at the same time. Specifically, a richer convolutional features (RCF) architecture based on multi-task learning is designed for preliminary detection and segmentation. Then, a generative adversarial network is employed to fine tune the hidden grain boundaries that covered by the second phase. Finally, a quantitative analysis module is designed to extract quantitative indicators according to the results of the two deep networks. We achieve 96.65% (accuracy), 0.8325 (IoU), 0.7824 (AJI) in the segmentation task and 92.65% (precision), 91.90% (recall) in the boundary detection task, which reach the state-of-the-art meanwhile.

TensorMask: A Foundation for Dense Object Segmentation

Conference Paper

Oct 2019

Prior Mask R-CNN Based on Graph Cuts Loss and Size Input for Precipitation Measurement

Abstract

Recommended publications

Semisupervised Boundary Detection for Aluminum Grains Combined With Transfer Learning and Region Gro...

Grain Boundary Detection and Second Phase Segmentation Based on Multi-task Learning and Generative A...

Weakly Supervised Segmentation Loss Based on Graph Cuts and Superpixel Algorithm

Metallographic Image Segmentation Method Based on Superpixels Algorithm and Transfer Learning