ArticlePDF Available

Prior Mask R-CNN Based on Graph Cuts Loss and Size Input for Precipitation Measurement

Authors:

Abstract

Fusing prior knowledge with the data-driven deep learning for measurement is interesting and challenging. For the detection of metallographic precipitations, the measurements of size and shape of precipitations are roughly predictable in advance through transmission electron microscope (TEM). In this paper, we proposed a novel instance segmentation network named Prior Mask R-CNN by fusing prior knowledge for automatic precipitation detection. On the basis of typical Mask R-CNN framework, we made the following improvements. First, at the bounding box classification stage, in order to restore area information, we input the size information besides only uniform dimension features after region of interest align (RoIAlign). Secondly, at mask segmentation stage, we proposed a new loss function based on normalized graph cuts. It is category sensitive by setting different weight strategies for different categories based on their prior shapes. In addition, from the point view of practicality, we designed an effective measurement extraction module to get specific measurements, such as the length of precipitations, from the final prediction results of our network. In a variety of experiments, our method achieves the highest mean average precision (mAP) 0.475 and 0.298 among different famous methods for bounding box detection and mask segmentation tasks, respectively, which proves the effectiveness of our method.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021 5010015
PriorMaskR-CNNBasedonGraphCutsLossand
Size Input for Precipitation Measurement
Mingchun Li ,DaliChen , Shixin Liu ,Member, IEEE, and Fang Liu
Abstract Fusing prior knowledge with data-driven deep
learning for measurement is interesting and challenging. For
the detection of metallographic precipitations, the measurements
of size and shape of precipitations are roughly predictable in
advance through a transmission electron microscope (TEM).
In this article, we proposed a novel instance segmentation
network named prior mask R-CNN by fusing prior knowledge
for automatic precipitation detection. On the basis of the typical
mask R-CNN framework, we made the following improvements.
First, at the bounding box classification stage, in order to restore
area information, we input the size information besides only
uniform dimension features after the region of interest align
(RoIAlign). Second, at the mask segmentation stage, we pro-
posed a new loss function based on normalized graph cuts.
It is category-sensitive by setting different weight strategies for
different categories based on their prior shapes. In addition, from
the point of view of practicality, we designed an effective mea-
surement extraction module to get specific measurements, such
as the length of precipitations, from the final prediction results of
our network. In a variety of experiments, our method achieves the
highest mean average precision (mAP) of 0.475 and 0.298 among
different famous methods for bounding box detection and mask
segmentation tasks, respectively, which proves the effectiveness
of our method.
Index Terms—Graph cuts, instance segmentation, metallo-
graphic image, precipitation detection, prior knowledge.
I. INTRODUCTION
THE precipitations are nanoscale microstructures of alloy
materials, which play key roles in the mechanical prop-
erties of products, such as yield strength, ultimate tensile
strength, and elongation. As a result, it is extremely impor-
tant to measure the precipitates accurately. In this article,
we mainly focus on six-series aluminum alloy. It is an excel-
lent structural material due to its good formability, corrosion
resistance, weld ability, and low cost [1]. In order to investigate
the nanoscale microstructures in materials, a transmission
electron microscope (TEM) is typically used [2]. When we
Manuscript received December 25, 2020; revised April 12, 2021; accepted
April 29, 2021. Date of publication May 6, 2021; date of current version
May 19, 2021. This work was supported in part by the National Key Research
and Development Program of China under Grant 2017YFB0306400 and
in part by the National Natural Science Foundation of China under Grant
61773104. The Associate Editor coordinating the review process was
Mohamad Forouzanfar. (Corresponding author: Dali Chen.)
Mingchun Li, Dali Chen, and Shixin Liu are with the College of
Information Science and Engineering, Northeastern University, Shenyang
110819, China (e-mail: 407996328@qq.com; chendali@ise.neu.edu.cn;
sxliu@mail.neu.edu.cn).
Fang Liu is with the School of Materials Science and Engi-
neering, Northeastern University, Shenyang 110819, China (e-mail:
liufang@smm.neu.edu.cn).
Digital Object Identifier 10.1109/TIM.2021.3077996
observe the aluminum alloy with TEM under the standard
setting, we find that the precipitates are embedded in the alloy
matrix (aluminum) in horizontal, vertical, and longitudinal
directions with expected size, as shown in Fig. 1.
From the point of view of material science, these
microstructures are described as needle-shaped precipitates
(horizontal and longitudinal) and dot-shaped precipitates
(vertical), respectively, and have important worthiness for
researching and studying [3]. Considered the observed precip-
itates in TEM, images are often ambiguous, and the contours
are not as obvious as natural images. The traditional computer
vision method is difficult to directly measure the precipitates.
For practical production and academic research, material sci-
entists need to manually measure the precipitates in these three
directions for each specimen, which is time-consuming and
boring.
Fortunately, in recent years, with the development of com-
puter vision, methods based on deep learning achieved awe-
some results in image classification [4], boundary detec-
tion [5], object detection [6], and image segmentation [7].
In fact, in the measurement field for materials science, deep
learning methods are also widely used. For example, for
steel inner microstructures, Azimi et al. [8] designed a fully
convolutional neural network (FCNN) under a novel max
voting strategy to obtain pixel-level segmentation of marten-
site, tempered martensite, bainite, and pearlite. For nickel-
based superalloy, Wang et al. [9] used typical U-Net to get
information of precipitates and established a microstructure-
related hardness model according to the segmentation results.
For aluminum alloy, Li et al. [10] employed generative adver-
sarial network (GAN) and multitask learning to achieve the
detection of the second phase particles and grain bound-
aries. These deep learning methods tend to perform better
than the traditional machine learning methods (support vec-
tor regression [11], shallow neural network [12], and mean
shift [13]) or rule-based methods (automatic thresholding [14],
level set [15], graph cuts [16], and ultimate opening [17]),
especially when dealing with more complex measurement
tasks. Therefore, in view of the complexity of TEM images in
this article, we designed an instance segmentation framework
based on deep learning to exactly detect precipitations in the
alloy.
As a natural extension of the object detection task, instance
segmentation aims to predict pixelwise object instance seg-
mentation and object category [18]. In recent years, it has
been widely used in the field of measurement, for example,
1557-9662 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Fig. 1. Stereogram display of the three directions (horizontal, vertical, and longitudinal) of precipitations embedded in aluminum alloy.
detection and diagnosis of electrical equipment based on
infrared [19] and additional natural images [20]. In fact,
the major instance segmentation frameworks are based on
proposal segmentation, considering the great success of R-
CNN [21]. A typical example is mask R-CNN [22]. It consists
of two parts: region proposal network (RPN) and region of
interest network. Based on box regression and classification
involved in the second stage of faster R-CNN [23], it added
an additional mask prediction branch. At that time, it achieved
top performance for the MS COCO instance segmentation
task [24]. The success of this method is attributed to the excel-
lent performance of a fully convolution network (FCN) [25]
in segmentation tasks and the effectiveness of gradient prop-
agation under the region of interest align (RoIAlign) layers.
Following this route, in recent years, more instance segmenta-
tion methods have been proposed. For the strategy of instance
segmentation framework, Liu et al. [26] proposed PAFPN
that enhanced the entire feature hierarchy at the stage of
RPN and linked feature grid by adaptive feature pooling. For
the topological structure of instance segmentation framework,
cascade mask R-CNN [27], hybrid task cascade [28], and
mask scoring R-CNN [29] set up an extra block to improve
performance through cascade structures [27], [28] and quality
score modules [29]. It is worth noting that these methods
do not pay attention to the combination of prior knowledge.
This is understandable, of course, because these methods are
proposed for natural images, in which the information (size
or shape) of different categories is diffuse and unpredictable.
However, when there is obvious and predictable knowledge of
categories, such as the shape of metal microstructures, how to
effectively use prior knowledge is worth further consideration.
From the perspective of the training strategy, the most
intuitive way to employ prior knowledge is based on transfer
learning, which is widely used in a variety of deep learning
frameworks [30]. Transfer learning can expand the target
domain data by using the relevant source domain data, so as
to achieve the network with generalization ability and improve
the network performance. Typical paradigms include fine-
tuning parameters [31], domain adaptation [32], and so on.
It is one of the important tools to solve few-shot learning
from the perspective of data [33]. It implicitly transfers prior
knowledge to the network through relevant data. Besides that,
some recent studies showed that prior knowledge could be
fused to the end-to-end network more explicitly. For example,
when the shape of the foreground is known in the segmentation
task, Mirikharaji and Hamarneh [34] proposed a novel loss
term to encode the object shape and embed it into the loss
function to punish the predicted shape that does not satisfy
the prior knowledge. Han et al. [35] designed convex shape
sensitive loss function through a simple ergodic formula to
improve the robustness of the deep network to the noise and
reflection for pupil segmentation. When the location of each
category is known in advance, Zotti et al. [36] took the cardiac
distribution in 3-D position as prior knowledge and merge
it with the feature map before the classification layer in the
network to improve the performance of cardiac segmentation.
In addition to the training strategy, we can also modify the
structure of the network to fuse prior knowledge. Specifically,
for the image segmentation task, an obvious knowledge is that
the prediction results should be smooth and continuous. This
is very important for a dense prediction network, considering
that the network is usually trained pixelwise. In order to
address this problem, Chen et al. [37] designed an additional
conditional random field (CRF) module as postprocessing for
the prediction of a deep network to improve the localization
performance. Similarly, Zheng et al. [38] integrate the complex
energy function inference process of CRF into the running
logic of recurrent neural networks (RNNs), so as to realize the
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
end-to-end training with additional position and color informa-
tion. In fact, for the typical black-box model, such structural
modification can effectively make up for the deficiency of deep
learning and transfer some common and general knowledge to
the network to ensure the rationality of the prediction results.
In this article, our contributions can be summarized as
follows.
1) We proposed a two-stage instance segmentation frame-
work called prior mask R-CNN for automatic metallographic
precipitation measurement of aluminum alloys. At the RoI
network stage, we input the size information of each object for
recovering the area information between different categories
after the RoIAlign layer.
2) A new loss function based on normalized graph cuts is
proposed. By assigning weights in the graph based on different
rules for each category, we designed a shape-sensitive cut loss
function and embed it into the mask training period with the
original cross-entropy loss function, meanwhile.
3) We developed a simple postprocessing module to extract
the measurement information from the prediction results based
on the region-growing algorithm. This module could effec-
tively obtain quantitative information about the precipitates,
which plays a key role from the perspective of practical
applications.
II. METHODOLOGY
Here, we clearly point out that our task is to measure
the precipitates in TEM images of aluminum alloys. The
categories of precipitates can be divided into three types
according to their growing direction: horizontal precipitates,
vertical precipitates, and longitudinal precipitates (see Fig. 1).
First, we get the instance segmentation from the metallo-
graphic image by the proposed prior mask R-CNN. Second,
for the prediction results, we set up a postprocessing module to
obtain the specific measurement of each kind of precipitate.
In this section, we will specifically introduce the proposed
methodology in detail. Section II-A introduces the topology
structure of prior mask R-CNN and the specific size input
link. Section II-B demonstrates the cut loss function that
includes shape prior knowledge based on graph cuts. The
postprocessing module used to extract precipitate information
would be presented in Section II-C.
A. Structure of Prior Mask R-CNN
In this work, our ultimate goal is to help materials science
get concerned information from metallographic images. This
information mainly refers to the statistical information of
the size of precipitates, which can reveal the mechanical
properties of the alloy. Specifically, different from image
classification [39], [40] and semantic segmentation [41], our
task needs to detect each precipitate in the image and measure
it one by one. That is to say, it involves object detection [42]
and segmentation in turn. From the perspective of computer
vision, this is a typical instance segmentation task [43], [44].
Considering the challenge of specific noises (such as occlu-
sion, interference, and distortion) in the TEM images, on the
basis of the mask R-CNN framework, we introduce the size
information, and its specific structure is shown in Fig. 2.
In general, our network mainly consists of three parts:
backbone, RPN, and RoI network. The backbone is deployed
to obtain the abstract features of metallographic images by a
hierarchic convolution operation. On the basis of these con-
volution feature maps, RPN is employed to provide proposals
that may contain foreground at the first stage, and the RoI
network is used to fine-tune the proposal results and get the
mask segmentation at the second stage. Specifically, for back-
bone, we adopt the ResNet50 and feature pyramid networks
(FPNs). After constant stride operations in ResNet50, a series
of feature maps with different sizes are obtained. In FPN, these
features maps will be gradually fused, and finally, we get five
scale feature maps (x15).AtthestageofRPN,wesetup
three kinds of anchors with different length-to-width ratios.
Here, we do foreground detection for each feature map of five
scales, respectively, instead of multiscale anchor for fusion
features. That is to say, the loss function of RPN is the sum
of box classification loss and regression loss from five feature
maps, as follows:
LRPN =Lrpn__cls +Lrpn__reg =
5
i=1
Lrpn__cls (xi)+Lrpn__reg (xi)
(1)
where xiindicates the feature map from the ith scale.
The last part of the framework is the RoI network. It is
used to classify objects and segment each instance. Its input
involves two parts: the proposal from RPN and the feature
maps extracted based on the backbone. Among them, the effect
of the backbone is to extract deep features through learned
hierarchical convolution. The effect of RPN is to provide
candidate boxes that might contain the desired object from
the image. In order to obtain more accurate boundary boxes
of objects, nonmaximum suppression (NMS) based on IoU is
employed [22]. It could reject a region if it has an IoU overlap
with a higher scoring selected region larger than a certain
threshold. After NMS, we could get multiple instance-level
feature maps by cropping or resizing the image-level features
under the instructions of boxes. These feature maps will be fed
into the regression layer used to fine-tune the boundary box
and the classification layer used to identify the object category.
After the boundary boxes correction at the RoI network stage,
we obtain the refined feature maps by RoIAlign and fed them
into the mask layer used to object segmentation. Different from
the fully connected network at the classification and regression
layer, the segmentation layer is based on the convolution
network. The total loss of the RoI network could be expressed
as follows:
LRoI =Lcls +Lreg +Lmask.(2)
It is worth noting that the RoIAlign operation will change
the feature map of different scales into a unified scale through
bilinear interpolation. This operation can effectively transform
the objects of different scales to a uniform size, which is
necessary for the following network of classification and
segmentation tasks. In essence, RoIAlign [22] is standard
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Fig. 2. Structure of prior mask R-CNN.
operations for extracting a very small feature map from each
proposal box, regardless of the box size. That is to say, after the
RoIAlign operation, whether it is a large box or a small box,
it will convert it into a normalized feature map indiscriminately
by interpolation. There is no doubt that such an operation will
greatly damage the scale information of the object, especially
when the scale has a clear correlation with the object category.
To solve this scale damage problem, we assume that the sizes
of precipitates are related to the manufacturing process and
could be predicted when accurate precipitate information is
obtained. For example, the average size of the horizontal
precipitate is 1046.7 nm2in Fig. 3. That is, the object category
is highly dependent on the box size. However, all the feature
maps after RoIAlign would be the same size, and their original
scale information would be discarded. Therefore, we make the
following structural adjustment that constitutes our method’s
novelty and effectiveness.
For the classification (cls) layer and regression (reg) layer
of RoI, we adopt a four-layer fully connected network. First,
we straighten the 3-D feature graph (channel ×height ×
width) to the 1-D feature by the flatten operation. Next,
straightened features are fed into two fully connected layers
whose outputs are gradually reduced in turn. In the output
layer, we fuse the size information as an additional input
(green neural unit) with the output of the second fully con-
nected layer. Here, we point out that the size is calculated
based on the area of the boundary box, which is readily
available through a simple product operation from the RPN
results. Finally, the classification layer and the regression
layer based on the shared features, respectively, predict the
category of the object and the boundary box location. Specif-
ically, in Fig. 3, classes H, V, and L refer to the horizontal
precipitates, vertical precipitates, and longitudinal precipitates
respectively; tcx,tcy ,tcw,and tch are the offsets of the detected
box for category c. So far, through a simple skip connection,
we realized the size input to help the model fuse scale
information without additional high computational cost.
B. Loss Function of Prior Mask R-CNN
Once the topology of the network is determined, we need
to consider specific learning objectives. The objective function
of the optimization problem is more commonly called the
loss function for machine learning. It plays a key role in
deep learning, which determines how to guide the parameters
in the model to update. Whether for supervised learning or
unsupervised learning, it is very important to set a reasonable
loss function. In this section, we will introduce the loss
function involved in prior mask R-CNN in detail.
In general, the loss function in prior mask R-CNN is mainly
involved at the RPN stage and the RoI network, which are
shown as (1) and (2), respectively. In (2), we find that the
loss function of the RoI network consists of three parts:
classification loss Lcls , regression loss Lreg, and segmentation
loss Lmask . In view of the structure of total loss function,
it could be regarded as multitask learning [45], which aims
to leverage valuable knowledge that is involved in related
tasks to improve the whole performance of the network. First,
we show the classification loss and the regression loss of the
RoI network for one sample, as follows:
Lcls =kc
C
c=1
gclog(sc)(3)
Lreg =kc
C
c=1
I
i=1
gc|t
ci tci |(4)
where kcis the weight of category c(horizontal, vertical, and
longitudinal precipitations), gc∈{0,1}is the class-level binary
ground truth, sc∈[0,1]is the class-level prediction result
after activation function, and t=(t
cx,t
cy,t
cw,t
ch )is the real
offset set between ground truth and region of interest, whereas
t=(tcx,tcy ,tcw,tch )is the predicted offset set of ground truth
and interesting region.
In (3) and (4), we add the weight kcfor categories based
on standard cross-entropy loss and L1-norm loss function.
The reason why we do this is that the number of different
kinds of precipitates is obviously different. In fact, unlike the
unpredictability of objects in natural landscape images, each
metallographic image in our dataset contains all three kinds
of precipitates with different amounts at the same time. For
example, in a TEM image, the number of vertical precipitates
observed is always much more than that of longitudinal
or horizontal precipitates. Therefore, inspired by the class-
balanced strategy [46], we use the weight kcto alleviate the
imbalance problem of samples for the object classification and
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
Fig. 3. Proposed box size input (green neural unit) by skipping connection in the RoI network.
boundary box regression tasks. Specifically, kcis equal to the
total number of objects divided by the number of objects for
category c.
Besides, for the mask layer in the RoI network, its task is
to segment the region of interest. Different from the previous
classification layer and regression layer, it is a dense prediction
based on a convolution network. Here, we directly show the
mask loss function for metallographic precipitates in prior
mask R-CNN, as follows:
Lmask =αLCE +βLB+γLCut (5)
where LCE means the typical cross-entropy loss, LBis the
boundary loss proposed in [47], LCut indicates the proposed
cut loss in this article, and α=1, β=1.5, and γ=1.5are
the weights for these three losses.
Among them, the cross-entropy loss LCE is the most
common loss function in segmentation tasks. It guides the
network learning by calculating the cross entropy of each pixel
independently based on ground truth. It can be expressed as
follows:
LCE =
C
c=1
Gc·log(Sc)(6)
where Gc∈{0,1}Nis the binary vector based on ground truth
with the shape of Nfor category c,Nis the total number of
pixels in image, Sc∈[0,1]Nis the predicted value vector
after sigmoid function for specific category c,and·means the
vector inner product.
In fact, from the nature of the loss function, the cross-
entropy loss as in (6) is to treat the image segmentation as
many isolated pixel classification problems, which is some-
what inconsistent with the human visual system. In order to
alleviate this problem, many novel design ideas are proposed
to compensate for the cross-entropy loss. For example, we can
directly optimize the evaluation index to improve the per-
formance of the model. Specifically, in [48], the Dice loss
function based on the similarity measurement dice coefficient
is proposed to solve the imbalance between foreground and
background voxels in medical images. In fact, for the class
imbalance problem of image segmentation, the Dice loss
function is widely employed for various deep networks and
respective tasks. Besides V-Net [48], in [49], it is used to
train a fully convolutional densenet for diffusion-weighted
images. Similarly, for the vessel segmenting in the X-ray
coronary angiography image sequence [50], the Dice loss
is selected as a loss function to train an encoder–decoder
framework with a channel attention mechanism to tackle the
class imbalance problem. Similarly, the Hausdorff distance
used to quantify the difference between two sets is also
encoded as a simple loss function, which is estimated by
three approximate methods in [51]. In addition, the perceptual
loss from the deep network, the loss function based on the
region, and the energy-based loss function are also widely
used to solve their respective problems [52]. No doubt, it is
important to design an appropriate loss function according
to the specific situation. Considering that the morphology
of precipitates in TEM metallographic images is predictable
(vertical precipitates as dot-shaped, horizontal precipitates,
and longitudinal precipitates as needle-shaped with different
directions), which is effective prior knowledge, we propose
cut loss based on normalized graph cuts to compensate for
the cross-entropy loss.
Graph cuts are an effective unsupervised image segmen-
tation method based on graph clustering. For the binary
classification of foreground and background in image I,let
the point set of foregrounds be Aand that of backgrounds be
B,thatis, AB=∅,AB=I. The graph cuts can be
expressed as follows,
Cut(A,B)=
uA,vB
w(u,v),(7)
where w(u,v) indicates the designed weight between uand v.
Specifically, to improve the performance of segmentation,
a regularized and extended version of the cut measurement
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
named normalized graph cuts [53] can be written as
NCut(A,B)=Cut(A,B)
assoc(A,I)+Cut(A,B)
assoc(B,I)(8)
where assoc(A,I)=uA,tIw(u,t)indicates the sum of
weights between points in Aand all the points in image I.
assoc(B,I)is the same definition.
From the perspective of optimization, once the weight
matrix Wis determined, the objective function based on
normalized graph cuts is
min
Vc
C
c=1
gc
VT
cWc(1Vc)
DT
cVc
(9)
where WcRNNis the weight matrix for category C,gc
{0,1}is the class-level binary ground truth representing the
category contained in image I,Dc=Wc1 means the sum
of the elements of each row in Wc,andVC∈{0,1}Nis the
decision variable.
Here, we expand the binary classification task between
foreground and background into a multiclassification problem,
which is more appropriate for our task. It is obvious that, for
the optimization problem in (9), the decision variable VCis
the final segmentation result based on graph cuts. In order to
solve this kind of problem, an effective method is to transform
it into an eigenvalue solving problem based on the Rayleigh
quotation [53]. Furthermore, if we relax the hard constraint of
VCfrom {0,1}Nto the soft [0,1]N, regarding it as probability
output Scof deep learning after sigmoid activation function
is interesting. In other words, we transform the intuitive
optimization of decision variables into the optimization of
network parameters in deep learning. Inspired by [54], such
an optimization problem could be used as loss functions in
a deep network and solved iteratively by a backpropagation
algorithm based on gradient. The proposed cut loss function
and its gradient can be written as follows:
LCut =
C
c=1
gc
ST
cWc(1Sc)
DT
cSc
(10)
LCut
∂θ =
C
c=1gc
ST
cWc(1Sc)
DT
cSc
∂θ =
C
c=1
gc
ST
cWc(Sc)
DT
cSc
∂θ
=
C
c=1
gcST
cWcScDc
DT
cSc22WcSc
DT
cScSc
∂θ (11)
where θindicates the parameter of the deep network.
In (10) and (11), the category weight Wcspecifies the
correlation strength between pixels, which largely determines
the final segmentation result. Generally speaking, the weight
matrix is symmetric, and each element needs to be calculated
independently. A simple way to obtain the weight matrix is to
use the kernel function based on pixel feature vector, which
is the popular method to get the energy function in CRF [55]
Wci,j=kFi,Fj=e(FiFj)T1
c(FiFj)(12)
where the pair iand jrefer to the index position in the matrix
Wc,Fiis the feature vector for pixel iand the same definition
for Fj,and1
cindicates the inverse of the covariance matrix.
Considering the smoothness of prediction and the difference
of categories, the features of pixels include the 2-D position
information {X, Y} and the color information with three
channels {R, G, B}. Here, for the sake of simplicity, we only
consider the gray image with single channel G. In other words,
the specific representation of feature vector Fifor pixel i
is [xi,yi,gi]. Next, we discuss the relationship between the
weight matrix and prior knowledge.
In fact, when the shape of a particular category is known
in advance, the statistics characteristics of positions are also
predictable. This gives us inspiration when setting the weight
matrix in the cut loss function. As shown above, our task
is to segment the precipitates in three different directions:
horizontal, vertical, and longitudinal. The precipitates with
different shapes have their own statistic characteristics of the
position. For example, for horizontal precipitates, the position
axes Xand Yare negatively correlated. In contrast, Xand
Yin longitudinal precipitates are always positively correlated.
In fact, this phenomenon still exists after the RoIAlign of the
prior mask R-CNN framework because it does not change
the sign of the correlation between position axes Xand Y.
An example is shown in Fig. 4.
In Fig. 4, we show the weight matrix of the central
pixel, which could be obtained by reshaping the medium
row or column in the category weight matrix Wc.First,
it should be pointed out that, after the RoIAlign for the feature
map, we need to additional extract the raw image for the
same boundary box proposal to obtain the color information,
as shown in the upper left of Fig. 4. Here, we set three kinds
of correlation degree τxy ={+1.5,0,1.5}with the color
channel to illustrate the difference of weight matrix under
respective strategies. Among them, the color channel focuses
on gray difference whereas the position focuses on smooth-
ness. The weight matrix of Color Channel+XYChannelxy =
+1.5)(lower middle) is the closest to the binarization matrix
based on the ground truth, which is also the most consistent
with the statistics characteristics of the horizontal precipitate.
This shows that, when we know the shape of a specific
category in advance, more specific, and location statistics
characteristics, the prior knowledge could be encoded through
the covariance matrix cthat is involved in weight matrix Wc
to be fused into the loss function LCut . Here, we intuitively
give the inverse of matrix c, which can be directly used
in (12)
1
c=
τpτpτcxy 0
τpτcxy τp0
00τg
(13)
where τp=1/102and τg=1/162are weights for
position and color channel, and τcx y ={+2.5,0,2.5}refers
to correlation degree, which could be selected according to the
opposite of covariance of each category (horizontal, vertical,
and longitudinal).
Objectively speaking, scholars have proposed many dif-
ferent loss functions for segmentation tasks. According to
the category in [56], our cut loss could be regarded as the
region-based loss compared with the pixelwise CE loss. It is
worth mentioning that the cut loss does not require pixel-level
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
Fig. 4. Weight matrix of the central pixel under different strategies.
annotations, which is an interesting characteristic. It can be
seamlessly integrated into weakly supervised learning [57].
We only need class-level annotations to indicate the shape. The
category information will be encoded in the weight matrix Wc,
which combines the original image information. Specifically,
by setting different covariance, we can specify the smoothness
preference to maintain the prediction shape of the category.
From the perspective of computational efficiency, unlike the
complex inference process in CRF based on energy function
minimization, our loss function can be easily realized through
gradient backward propagation in deep learning, as shown in
(11). Of course, it is still time-consuming to calculate the huge
weight matrix Wc. However, since the raw images and the
feature maps are reduced to a smaller scale after RoIAlign,
the computational power loss is acceptable.
In addition to the statistical characteristics of the position,
the contour is a more explicit and intuitive descriptor for
the shape of the object. When the pixel-level annotation is
available, it is meaningful to measure the predicted boundary
pand the boundary gin the ground truth for transferring the
shape information. Here, we use the boundary loss proposed
in [47] as another loss function of the mask layer to improve
the performance. We directly show the final nonsymmetric
L2 distance result after approximation as follows:
LB=
C
c=1
gc(φG·Sc)(14)
where φGRNis a distance vector that can be calculated in
advance with the same shape as Sc.
In (14), every element in φGrepresents the signed distance
between the current pixel and the nearest real boundary in the
ground truth. Specifically, if the current pixel is in the ground
truth, the symbol of the distance is negative. Otherwise, it is
positive. In other words, in order to minimize the boundary
loss LB, we need to maximize the predicted values for the
positive pixels and minimize the predicted values for the
negative ones at the same time. This is in line with perceptual
cognition. It should be noted that, unlike conventional loss
functions, such as cross entropy, its results may be negative
due to the approximation in the mathematical derivation that is
used for simplified calculation. However, this does not affect
its effectiveness in a deep network, which has been verified in
medical imaging [47]. In fact, its initial aim is to measure
the distance between two curves by integrating pixels on
the boundary. In view of differentiable requirements and the
limitation of computational power, the loss function is simpli-
fied as the inner product of two vectors after approximation.
To some extent, it is equivalent to an L1-norm loss function
with pixelwise weight, which implies useful information about
boundaries and shapes.
At this point, we have completed the design of the seg-
mentation loss function of the mask layer in the proposed
prior mask R-CNN. In general, we set up three loss functions:
pixelwise cross-entropy loss LCE in (6), boundary loss LB
in (14) based on boundary (shape) measurement, and the
proposed cut loss LCut in (10) considered shape statistical
characteristics, as shown in Fig. 5.
Besides, in metallographic images, noise exists objectively
and can be divided into three categories: occlusion (inclination
fringes), interference (dislocation), and distortion (residual).
They are caused by the observation process, the specimen
itself, and the preparation, respectively. Among them, noise
caused by occlusion affects performance the most. More
seriously, considering the imaging principle of TEM, occlusion
(equal inclination or thickness fringes) is very common in
images. When the precipitates appear near the occlusions,
the model needs to be able to repair the occluded part.
From the perspective of noise suppression, our designed loss
function (see Fig. 5) could fill in the occluded part effectively
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Fig. 5. Loss function of the mask layer in prior mask R-CNN.
Algorithm 1 Postprocessing for Information Extraction
Input: mask list M=[m1,m2,...mp], category list C=
[c1,c2,...cp]
Output: precipitates length list L=[l1,l2,...lp]
Initialize list L=[]
For iin range ( p):
Extract miin mask list M,ciin category list C;
Get region instances riby region growing algorithm based
on mi
Filter out non-maximum areas in ri
Detect eight corner key points (p1,p2... p8)in ri
If ci== horizontal:
li=Mean[Dist(p1,p8), Dist(p2,p7), Dist(p3,p6),
Dist(p4,p5)]
elif ci== longitudinal:
li=Mean[Dist(p1,p4), Dist(p2,p3), Dist(p5,p8),
Dist(p6,p7)]
else:
li=Mean[Dist(p1,p5), Dist(p2,p6), Dist(p3,p7),
Dist(p4,p8)]
Append lito L
Return precipitates length list L
by introducing the prior knowledge of shape and contour,
which is a significant improvement.
C. Postprocessing Module for Measurement
For the analysis of precipitates in this article, our ultimate
aim is to help material scientists measure the precipitates,
rather than solve a pure computer vision problem. The mea-
surement here refers to the statistical information of the three
kinds of precipitates, such as the distribution or mean value
of precipitates’ length, which is very important to reveal the
mechanical properties of the alloy. Therefore, from the practi-
cal point of view, we design a postprocessing module to extract
valuable information on the basis of instance segmentation.
Fig. 6. Flowchart of the postprocessing module.
Specifically, from the output of the computer vision network
to measurement acquisition, we mainly face two inevitable
problems. First, the segmentation results of network output
are not always connected, considering that it is obtained
by aggregating each pixel prediction independently. In other
words, a predicted mask may contain multiple isolated regions
that are treated as precipitates at the same time, which might be
caused by the visual noise in TEM images. Second, how to get
robust and reliable length information from irregular connected
domains is also a little problem that needs attention. In view
of the above problems, we designed a simple postprocessing
module based on region growing and key points’ detection for
measuring the precipitations, and the implementation details
are shown in Algorithm 1.
In brief, Algorithm 1 mainly includes a region-growing
algorithm, area filter, key points’ detection, and category-
wise length measurement. First, we use the seeded region-
growing algorithm [58] to get the region instances ri. Under
eight-neighbor pixels’ strategy, the selected seed points (pre-
dicted as foreground pixels) are grown to get the instance of
the connected domain. Next, nonmaximum areas are filtered
out to eliminate the effect of noise and obtain the real precipi-
tates. Then, eight corner points are detected in turn by simple
maximum and minimum functions based on plane position.
We point out that these corners may coincide, considering
the irregular shape. Finally, we set up different distance
measurement rules according to the categories of precipitates,
which is consistent with the statement in materials science. For
example, for horizontal precipitation, the length is based on
the long side. In contrast, for vertical precipitation, the length
refers to its diameter. An example of a postprocessing module
is shown in Fig. 6.
III. RESULTS
In this section, we will show a variety of experiments to
test the effectiveness of the proposed prior mask R-CNN for
the detection of precipitates in alloys.
A. Dataset
In this article, our experimental object is direct chill cast
Al–12.7Si–0.7Mg alloy without further chemical modification,
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
Fig. 7. Left: training samples (blue box) and test samples (red box) in one
slice. Right: overlapping samples (blue box) by sliding for augmentation.
which is widely used as structural materials. The dataset
contains 30 metallographic slices with the size of 2048 ×
2048 under different heat treatment conditions (such as dif-
ferent aging times and aging temperatures). It was observed
by transmission electron microscopy at a scale of 1 pixel:
0.15625 nm. For these metallographic images, we can find
a series of precipitates based on Mg and Si growing in three
orthogonal directions. Specifically, we call them horizontal,
vertical, and longitudinal precipitates, which are labeled at
pixel level by a material expert and seven volunteers in our
team. Here, we point out that the terms “horizontal” (about
+30to the horizontal axis) and “longitudinal” (about 60
to the horizontal axis) in this article are not strict. They are
only used to distinguish each other. After annotation, we divide
each slice into four parts and distribute them to the training
set and the test set, respectively. As with many deep learning
projects, we augment our metallographic dataset to expand
the training samples used to train the network. It should be
noted that not all typical image augmentation methods and
affine transformation are allowed in view of the clear material
science significance of precipitates in metallographic images.
For example, image rescaling or rotation may lead to weird
precipitates that cannot be observed in practice, at least under
the current TEM settings. In contrast, overlapping cutting
in the slice is allowed, as shown in Fig. 7. Finally, after
data augmentation, our training set contains 300 (90 raw +
210 augmentation) images with the size of 1024 ×1024,
whereas the test set contains 30 images of the same size.
B. Instance Segmentation Results
First, we introduce some quantitative indicators used to
evaluate our method. For the instance segmentation task,
generally speaking, the performance of the model is evaluated
from two aspects: object detection and mask segmentation.
Among them, for object detection tasks, a common evaluation
index is the mean average precision (mAP). It is popular for
natural image tasks, such as MS COCO [24] and PASCAL
VOC challenge [59], as follows:
mAP =
C
c=1
APc=
C
c=1
1
Nc
Nc
TP=1
max
tpTPtp
tp +FP(tp)(15)
where Ncmeans the total number of instances for category c,
tp means the number of the true detected object, and FP(tp)is
TAB L E I
HYPERPARAMETERS INVOLVED IN OUR METHOD
a specific function that returns the number of the false detected
object; specifically, if the number of tp samples could be
detected, the minimum number of false positives is returned;
otherwise, return infinity.
Next, we will do experiments to test the performance of the
proposed prior mask R-CNN. The specific hyperparameters of
this work are shown in Table I. Overall, we basically inherited
the typical settings in mmdetection [60]. For example, in each
training iteration, up to 256 anchors and 512 RoI are randomly
selected to guide RPN and RoI network learning, respectively.
The maximum number of proposal boxes from RPN is 1000,
and the NMS threshold for the positive sample is set to 0.7.
Considering the specificity of our task, some hyperparameters
need to be adjusted accordingly. In view of the small size
of vertical precipitates, we set the scale of the anchor to 4
to ensure that they can be fully detected. As for the learning
strategy, we use Adam [61] with a learning rate of 0.0003,
and the training epoch is set to 15. After the 15th training
epoch, 30 test images that did not appear in the training period
would be used to evaluate the performance. Instances with the
predicted probabilities higher than 0.3 will be considered valid,
and based on this, the quantitative index will be calculated
by (15). In order to further verify the effectiveness of the pro-
posed method, mask R-CNN [22], mask scoring R-CNN [29],
and cascade mask R-CNN [27] are also considered under the
same dataset as a comparison. The quantitative indicators are
showninTableII.
In Table II, we list the mean mAP and each category
AP at the same time, where the subscripts H, V, and L
refer to horizontal, vertical, and longitudinal precipitations.
The definition of AR is based on the same rule, and the
bold number of each column is the best performance of the
corresponding evolution index. Considering universality and
fairness, mAP is selected as the main index to comprehensively
evaluate methods.
On the whole, the performances for object detection (upper
of Table II) are generally better than that for mask segmenta-
tion (lower of Table II). This degradation could be understood,
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
TAB L E I I
PERFORMANCE OF OBJECT DETECTI ON (UPPER)AND MASK SEGMENTATION (LOWER )AMONG MASK R-CNN, MASK SCORING R-CNN, CASCADE
MASK R-CNN, AND THE PROPOSEDPRI OR MASK R-CNN
considering that mask segmentation is often based on the
results of object detection for the typical two-stage instance
segmentation framework. However, whether for object detec-
tion (upper of Table II) or mask segmentation task (lower of
Table II), the proposed prior mask R-CNN achieves better
performances in more evaluation indexes. Among them, for the
main index mAP of object detection, our algorithm achieves
the highest score of 0.475, which is ahead of 0.397 from mask
R-CNN, 0.447 from scoring R-CNN, and 0.378 from cascade
R-CNN. The situation of mask segmentation is basically the
same, and our method achieves the highest score of 0.298 for
the mask segmentation task. It is obvious that our method
should be more effective and appropriate for the detection and
segmentation of metallographic precipitates.
Besides, we observe significant differences in performance
among different categories. For example, in the upper of
Table II, the minimum APVfor vertical precipitates of all
methods is 0.496 (mask R-CNN), whereas the maximum APH
for horizontal precipitates is only 0.356 (prior mask R-CNN).
This phenomenon is even more obvious in the mask segmen-
tation task. This is mainly due to the difficulty in predicting
the horizontal or longitudinal precipitates. In view of the
imaging principle of TEM, the horizontal and longitudinal
precipitates are often blurred with inexact contour compared
with the obvious dark gray vertical precipitates with a circular
shape, as shown in Fig. 1. In addition, for the prediction of
rectangle (needle) shape with a large length-to-width ratio of
horizontal and longitudinal precipitates, conventional convolu-
tion networks may encounter difficulties. It is worth noting that
this performance difference between categories is relatively
small for our proposed prior mask R-CNN. Fusing prior
knowledge into the deep network by our specific structure
(see Section II-A) and loss function (see Section II-B) might
alleviate the phenomenon. In addition, in order to compare
different methods more intuitively, we show prediction results
directly for the test set, just as in Fig. 8.
In Fig. 8, we selected three TEM metallographic images
to show the prediction results, which are realized by the
mmdetection toolbox [60]. The first row in Fig. 8 is the overall
prediction results of different methods, whereas the second
and third rows focus more on the boundary box detection
effect and mask segmentation result, respectively. Specifically,
for the second row in Fig. 8, our method predicts more
precipitates with higher scores, such as yellow horizontal
precipitates. This implies that our method is sensitive to the
complex precipitates, which is consistent with the high recall
rate (mAR =0.586) in Table II. Unlike mask scoring R-CNN
and cascade mask R-CNN, which add extra subnetwork to
the topology structure of mask RCNN, our method only fuses
the size input through a simple skip connection (see Fig. 3).
This is helpful for scale-sensitive classification problems, such
as precipitate detection in this article, so our method can
effectively detect more precipitates accurately. Besides, for the
third row in Fig. 8, our mask segmentation results are closest
to the shape of annotated precipitates in the ground truth. This
might be related to two additional segmentation loss functions
in prior mask R-CNN. To be more precise, the proposed cut
loss (10) that contains prior knowledge guides the network to
produce a smooth and consistent mask, by setting different
weight matrixes according to statistical characteristics. The
boundary loss (14) further ensures the rationality of the
predicted shape by measuring the contour distance between
prediction and ground truth. All these specific settings enable
our method to achieve better results.
Furthermore, we point out that the selection of hyperpa-
rameters in the model is ad hoc without using a validation
set. It implies that the hyperparameters in Table I may not
be optimal. The main criteria for selecting them are based
on the specific situation of our task. For example, we set the
Anchor Scale” as 4 to ensure that the vertical precipitates
with an average size of 26 nm2could be detected effectively.
The selections of the correlation coefficients τHxy =+2.5,
τVxy =0,and τLxy =−2.5 in the cut loss are based
on the statistical knowledge from the currently available
dataset. In addition, practicability is also an important cri-
terion. We changed the “Threshold for Test” from a typical
0.05 to 0.3. This correction leads to the degradation of the
mAP score (from 0.503 to 0.475) but effectively reduces
the false positive rate, which is more valuable for material
experts. Of course, the settings of all methods in Table II
are basically consistent, except for some inherent structures
or loss functions (e.g., additional scoring layer in Scoring
R-CNN [29]). Under the above configuration, the performance
improvement of our prior mask R-CNN is relatively obvious,
just like Table II.
As mentioned above, different from the natural image
challenge, the evaluation index based on computer vision is not
the most important for the actual microstructure detection task.
From a practical point of view, our method should be able to
extract useful information from TEM images, which is helpful
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
Fig. 8. Prediction results between mask R-CNN, mask scoring R-CNN, cascade mask R-CNN, and the proposed prior mask R-CNN.
for material scientists to measure and analyze. Therefore,
in the following, we test the performance of the postprocessing
module for measurement proposed in this article. In order
to get the results more fairly, we selected three different
batches in the test set. These three batches are produced under
different heat treatments, specifically aging time, which is
meaningful to study the mechanical properties of the alloy.
That is to say, the difference between these test images is
even more obvious due to the different production processes
and inevitable changes in the environment. In addition, other
methods are also considered for comparison. The results are
shown in Table III.
In Table III, we show the average length of three kinds of
precipitates for three batches. H, V, and L refer to the hor-
izontal, vertical, and longitudinal precipitations, respectively.
The GT in the last row indicates the real annotated results by
experts, and the bold numbers in each column are the closest
results to the ground truth. Generally speaking, the post-
processing results based on prior mask R-CNN are more con-
sistent with the real results, no matter for the aging time of 1 or
12 h. This shows that the measurement results based on region
growing and key points’ detection (see Section II-C) can
accurately extract the material science information from the
network prediction results. It further proves the effectiveness
and robustness of our method. However, we must point out
that the accuracy under the current dataset is junior (maximum
error =5 nm), which is not enough for material analysis.
However, with more accurate and fine-labeled metallographic
data, our method still has the potential to be used in the actual
production process.
It is worth mentioning that, in the actual production, the size
of the image may not be consistent with the image (1024 ×
1024) in this article. If we want to use the trained model to
predict different sizes of images directly, we need to manually
convert the predicted images into the same scale (1 pixel:
0.15625 nm). In other words, our model is more sensitive to
scale than to size. This may be related to the mechanism of the
convolutional network. Furthermore, by converting the scale,
more metallographic images can be used for training or testing.
Considering the value of TEM images, this is very meaningful
compared with directly discarding these data.
C. Ablation Study
For our proposed prior mask R-CNN, we make two
improvements to the basic mask R-CNN framework, from the
network structure and loss function to more accurately detect
the precipitates in the alloy. Specifically, in terms of structure,
we introduce size input in the classification and regression
layer of the RoI network by skipping connection. For the loss
function, the weakly supervised loss function (10) based on
traditional graph cuts and the boundary loss function (14)
based on distance are used to segment in the mask layer
of the RoI network. These two improvements together make
our method achieve better performance, whether from the
perspective of computer vision or practical point, as shown
in Tables II and III. In this section, the specific role of these
two improvements will be analyzed in detail. Specifically,
under the same training settings and dataset, we make addi-
tional experiments on basic mask R-CNN with only structural
improvement and only loss function improvement for ablation
study. The final object detection performance, mask segmenta-
tion performance, and prediction results are shown in Table IV
and Fig. 9, respectively.
The term “Specific Structure” in Table IV are corresponding
to the “Size Input Structure” in Fig. 9, which refers to the basic
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
TABLE III
AVERAGELENGTH OF PRECIP ITATES FOR THREE BAT CH ES (AGING TIME:1,3,AND 12 H)IN THE TEST SET AFTER INFORMATIONEXTRACTION MODULE
TAB L E I V
PERFORMANCE OF OBJ ECT DETECTION (UPPER)AND MASK SEGMENTATION (LOWER )OF BASIC MASK R-CNN, MASK R-CNN WITH SIZE INPUT
STRUCTURE,MASK R-CNN WITH ADDITI ONAL CUT AND BOUNDARY LOSS,AND COMPLETE PRIOR MASK R-CNN
Fig. 9. Prediction results between basic mask R-CNN, mask R-CNN with size input, mask R-CNN with cut and boundary loss, and complete prior mask
R-CNN.
mask R-CNN with additional size input. Similarly, “Specific
Loss” corresponds to “Cut&Boundrary Loss,” which indicates
the basic mask R-CNN with cut and boundary loss function.
The definition in bold is similar to the previous, that is, the
best performance for each evaluation index. First, from the
quantitative results in Table IV, it is obvious that both “Specific
Structure” and “Specific Loss” could improve the performance
of basic mask R-CNN. In the object detection task, from
the mAP column of the main evaluation index in upper of
Table IV, we find that the improvement effect of “Specific
Structure” (from 0.397 to 0.446) is slightly better than that of
“Specific Loss” (from 0.397 to 0.445). On the contrary, the
improvement effect of “Specific Loss” (from 0.242 to 0.296)
is better than that of “Specific Structure” (from 0.242 to 0.264)
in the mask segmentation task based on the lower of Table IV.
This situation is similar to other indicators, such as mAR,
which is used to test the recall rate of the model.
This phenomenon should be consistent with the original
intention of the designed two improvements. Specifically, for
structural improvement, we input the size information to the
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
classification and regression layer at the object detection stage,
after observing the obvious difference in the size distribution
of different types of precipitates (see Fig. 3). This is helpful
for size-sensitive microstructure classification tasks. It leads
to the improvement of the performance of object detection
after the structural improvement of the basic mask R-CNN.
Besides, loss improvement is mainly designed for the mask
segmentation task. Based on the predictable shape of different
precipitates, we set the cut loss to produce a smooth and proper
prediction result with category-related preference. In addition,
because the model is based on the typical two-stage instance
segmentation framework, no matter for RPN or the classi-
fication layer, the regression layer, and the mask layer in
the RoI network, their inputs are from the same backbone
convolution network (ResNet50 +FPN in this article). That
is to say, the improvement of any branch may be linked.
This also explains why the results in Table IV tend to show
methodological relevance. For example, an additional cut loss
set in the mask layer is also significant for the improvement
of the object detection task, except for the deserved original
segmentation task.
Finally, we test the effect of the proposed cut loss, which
is an important contribution of this article. In short, the cut
loss function is a kind of segmentation loss function, which is
inspired by the graph cuts theory. Compared with pixel-level
annotation used in the cross entropy, the cut loss only needs
class-level annotation and corresponding statistical character-
istics, which is also adaptive to weakly supervised learning.
The statistical characteristics of different categories could be
regarded as prior knowledge. When objects with the same
category appear in desired and predictable shapes, our loss
function benefited from prior knowledge is helpful for the
corresponding segmentation task. We note that the prior
knowledge is integrated into the loss function by setting the
corresponding weight matrix of categories. In the following,
in order to further test the effect of prior knowledge and cut
loss, we set three different weight matrices for longitudinal
precipitates by selecting different correlation degrees τLxy =
(2,2.5,3)in (13). The final prediction results are shown
in Fig. 10.
In Fig. 10, the first column shows the overall prediction
results under the three correlation degrees, followed by the
results of the longitudinal precipitates and the enlarged view.
Intuitively, the outputs of the network are quite different.
Specifically, when τLxy =−2, the prediction shapes of
longitudinal precipitates are relatively blunt. However, when
we set τLxy =−3, the predicted shapes of that become sharp.
This shows that, with constant learning, the loss function could
control the shape of the prediction result. At the same time,
it also implies that the iterative method based on gradient
descent is effective to optimize the objective function of nor-
malized graph cuts to a certain extent. That is to say, by setting
the weight matrix that involves prior knowledge in cut loss,
we can control the shape of predicted segmentation. Different
from the cross-entropy loss based on the pixel level, this loss
function is more consistent with human visual perception. The
relevant prior knowledge is naturally integrated into the end-
to-end training period of the deep network without additional
Fig. 10. Prediction results for longitudinal precipitations under different
correlation degrees [τLxy =(2,2.5,3))].
postprocessing modules or complex inference processes. More
importantly, the cut loss can be employed for any conventional
image segmentation network, besides the instance segmenta-
tion framework in this article. The cut loss function may be
an appropriate complement to the cross-entropy loss when the
shapes of categories are statistically significant.
D. Limitations
In general, based on deep learning, we proposed a novel
framework for the measurement of precipitates. There may be
some possible limitations in this study. First, from the per-
spective of experimental materials, our metallographic dataset
has only 300 nanometer-level TEM images. To be honest, it is
relatively small compared to the popular natural image dataset,
such as MS COCO (330k). Considering the high cost
of specimen preparation and expert annotation, it is difficult
to obtain a large number of metallographic images. How
to achieve better performance in the current small dataset
is worth thinking about. Second, from the perspective of a
deep network, strictly speaking, the tuning of hyperparameters
depends on the performance of the validation set. However,
in our work, the selection of hyperparameters in the model
is ad hoc without using such a validation set. As a result,
these results may not be fully generalized. In fact, finding the
proper hyperparameters might be very difficult, especially for
the model that contains many hyperparameters to be set in
this work. Finally, with the introduction of the proposed cut
loss (10), the computational efficiency will inevitably decrease.
Of course, this degradation (from about 0.55 to 0.75 s/image)
is basically acceptable. In future work, in order to solve the
problem of generalization, some learning strategies are worthy
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
5010015 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
of attention, such as few-shot learning and transfer learning.
As for the selection of hyperparameters, automatic machine
learning [62] seems to be a good solution. These may be the
key to the practical application of our automatic measurement
methods in materials science.
IV. CONCLUSION
In this article, we proposed a novel framework for the
measurement of precipitates in aluminum alloys. It is a
two-stage instance segmentation network, which is based on
mask R-CNN and consists of the backbone network, RPN,
and the RoI network. For the RoI network, considering that
the size distributions of different precipitate categories have
obvious differences, we input the size information based on
boundary box area into the classification layer and regression
layer of the RoI network through a simple skip connection.
Besides, since the shape of precipitates is predictable, the
proposed cut loss function, including prior knowledge, and
the boundary loss function for measuring contour distance
are designed to segment the mask in the mask layer. In fact,
our framework improves the basic mask R-CNN in terms of
topological structure and loss, respectively, based on the prior
knowledge (size and shape) of the category. As a result, we call
the proposed framework prior mask R-CNN. From a practical
point of view, we design a simple postprocessing module
to extract material information based on the region-growing
algorithm and key points’ detection. As for the experiments,
our method achieves an mAP score of 0.475 in the object
detection task and an mAP score of 0.298 in the mask
segmentation task, which surpasses other comparison methods.
In addition, the length information of precipitates obtained
from the output of our network is more consistent with that
annotated by experts. This should be attributed to the designed
structure and loss function of our method. In the ablation study,
we tested these designs separately and explored the relevance
of the proposed cut loss to the predicted shape. In summary,
when the shapes and sizes of the objects are predictable in
advance, our framework named prior mask R-CNN provides
a new idea to improve performance for automatic measurement
by fusing prior knowledge.
REFERENCES
[1] L. P. Troeger and E. A. Starke, “Microstructural and mechanical char-
acterization of a superplastic 6xxx aluminum alloy,” Mater. Sci. Eng.,
A, vol. 277, nos. 1–2, pp. 102–113, Jan. 2000.
[2] T. Hemalatha, S. Akilandeswari, T. Krishnakumar, S. G. Leonardi,
G. Neri, and N. Donato, “Comparison of electrical and sensing proper-
ties of pure, Sn- and Zn-doped CuO gas sensors,” IEEE Trans. Instrum.
Meas., vol. 68, no. 3, pp. 903–912, Mar. 2019.
[3] F. Liu, F. Yu, D. Zhao, and L. Zuo, “Microstructure and mechanical
properties of an Al-12.7Si-0.7Mg alloy processed by extrusion and heat
treatment,” Mater. Sci. Eng. A., vol. 528, pp. 3786–3790, Apr. 2011.
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in Proc. Int. Conf. Learn. Represent.,
2015, pp. 1–14.
[5] Y. Liu et al., “Richer convolutional features for edge detection,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1939–1946,
Aug. 2019.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput.
Vis . Patt er n R ecog. , Jun. 2016, pp. 779–788.
[7] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep
convolutional encoder-decoder architecture for image segmentation,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
Dec. 2017.
[8] S. M. Azimi, D. Britz, M. Engstler, M. Fritz, and F. Mücklich,
“Advanced steel microstructural classification by deep learning meth-
ods,” Sci. Rep., vol. 8, no. 1, pp. 1–14, Dec. 2018.
[9] C. Wang, D. Shi, and S. Li, “A study on establishing a microstructure-
related hardness model with precipitate segmentation using deep learning
method,” Materials, vol. 13, no. 5, p. 1256, Mar. 2020.
[10] M. Li, D. Chen, S. Liu, and F. Liu, “Grain boundary detection
and second phase segmentation based on multi-task learning and
generative adversarial network,” Measurement, vol. 162, Oct. 2020,
Art. no. 107857.
[11] K. Gajalakshmi, S. Palanivel, N. J. Nalini, S. Saravanan, and
K. Raghukandan, “Grain size measurement in optical microstruc-
ture using support vector regression,” Optik, vol. 138, pp. 320–327,
Jun. 2017.
[12] O. Dengiz, A. E. Smith, and I. Nettleship, “Grain boundary detection in
microstructure images using computational intelligence,” Comput. Ind.,
vol. 56, nos. 8–9, pp. 854–866, Dec. 2005.
[13] X. Zhenying, Z. Jiandong, Z. Qi, and P. Yamba, “Algorithm based
on regional separation for automatic grain boundary extraction using
improved mean shift method,” Surf. Topography, Metrology Properties,
vol. 6, no. 2, Apr. 2018, Art. no. 025001.
[14] H. Peregrina-Barreto, I. R. Terol-Villalobos, J. J. Rangel-Magdaleno ,
A. M. Herrera-Navarro, L. A. Morales-Hernández, and
F. Manríquez-Guerrero, “Automatic grain size determination in
microstructures using image processing,” Measurement, vol. 46,
no. 1, pp. 249–258, Jan. 2013.
[15] B. Lu, M. Cui, Q. Liu, and Y. Wang, “Automated grain boundary
detection using the level set method,” Comput. Geosci., vol. 35, no. 2,
pp. 267–275, Feb. 2009.
[16] B. Ma et al., “Fast-FineCut: Grain boundary detection in micro-
scopic images considering 3D information,” Micron, vol. 116, pp. 5–14,
Jan. 2019.
[17] C. A. Paredes-Orta, J. D. Mendiola-Santibañez, F. Manriquez-Guerrero,
and I. R. Terol-Villalobos, “Method for grain size determination in
carbon steels based on the ultimate opening,” Measurement, vol. 133,
pp. 193–207, Feb. 2019.
[18] L. Liu et al., “Deep learning for generic object detection: A survey,”
Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, Jan. 2020.
[19] B. Wang et al., “Automatic fault diagnosis of infrared insulator images
based on image instance segmentation and temperature analysis,” IEEE
Trans. Instrum. Meas., vol. 69, no. 8, pp. 5345–5355, Aug. 2020.
[20] J. Ma, K. Qian, X. Zhang, and X. Ma, “Weakly supervised instance
segmentation of electrical equipment based on RGB-T automatic anno-
tation,” IEEE Trans. Instrum. Meas., vol. 69, no. 12, pp. 9720–9731,
Dec. 2020.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation, in
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2014, pp. 580–587.
[22] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc.
IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[24] T.-Y. Lin et al., “Microsoft coco: Common objects in context,” in Proc.
Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[25] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 39, no. 4, pp. 640–651, 2017.
[26] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for
instance segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., Jun. 2018, pp. 8759–8768.
[27] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object
detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.
[28] K. Chen et al., “Hybrid task cascade for instance segmentation,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 4974–4983.
[29] Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring
R-CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2019, pp. 6409–6418.
[30] S. J. Pan and Q. Yang, A survey on transfer learning,” IEEE Trans.
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIOR MASK R-CNN BASED ON GRAPH CUTS LOSS AND SIZE INPUT FOR PRECIPITATION MEASUREMENT 5010015
[31] Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris,
“SpotTune: Transfer learning through adaptive fine-tuning, in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 4805–4814.
[32] Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim,
“Image to image translation for domain adaptation,” in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4500–4509 .
[33] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few
examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53,
no. 3, pp. 1–34, Jul. 2020.
[34] Z. Mirikharaji and G. Hamarneh, “Star shape prior in fully convolu-
tional networks for skin lesion segmentation,” in Proc. MICCAI, 2018,
pp. 737–745.
[35] S. Y. Han, H. J. Kwon, Y. Kim, and N. I. Cho, “Noise-robust pupil center
detection through CNN-based segmentation with shape-prior loss,” IEEE
Access, vol. 8, pp. 64739–64749, 2020.
[36] C. Zotti, Z. Luo, O. Humbert, A. Lalande, and P. M. Jodoin, “GridNet
with automatic shape prior registration for automatic MRI cardiac
segmentation,” in Proc. STACOM-MICCAI, 2017, pp. 73–81.
[37] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
“DeepLab: Semantic image segmentation with deep convolutional nets,
atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
[38] S. Zheng et al., “Conditional random fields as recurrent neural net-
works,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,
pp. 1529–1537.
[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 770–778.
[40] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4,
Inception-ResNet and the impact of residual connections on learning,”
in Proc. AAAI Conf. Artif. Intell., 2016, pp. 4278–4284.
[41] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
works for biomedical image segmentation,” in Medical Image Comput-
ing and Computer-Assisted Intervention—MICCAI, 2015, pp. 234–241.
[42] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature pyramid networks for object detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.
[43] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance
segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
Oct. 2019, pp. 9157–9166.
[44] X. Chen, R. Girshick, K. He, and P. Dollár, “TensorMask: A foundation
for dense object segmentation,” in Proc. IEEE/CVF Int. Conf. Comput.
Vis. (ICCV), Oct. 2019, pp. 2061–2069.
[45] Y. Zhang and Q. Yang, A survey on multi-task learning,” 2017,
arXiv:1707.08114. [Online]. Available: http://arxiv.org/abs/1707.08114
[46] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE
Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1395–1403.
[47] H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and
I. B. Ayed, “Boundary loss for highly unbalanced segmentation, in
Proc. Int. Conf. Med. Imag. Deep Learn, 2019, pp. 285–296 .
[48] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutional
neural networks for volumetric medical image segmentation,” in Proc.
4th Int. Conf. 3D Vis. (3DV), Oct. 2016, pp. 565–571.
[49] R. Zhang et al., Automatic segmentation of acute ischemic stroke
from DWI using 3-D fully convolutional DenseNets,” IEEE Trans. Med.
Imag., vol. 37, no. 9, pp. 2149–2160, Sep. 2018.
[50] D. Hao et al., “Sequential vessel segmentation via deep channel attention
network,” Neural Netw., vol. 128, pp. 172–187, Aug. 2020.
[51] D. Karimi and S. E. Salcudean, Reducing the Hausdorff distance in
medical image segmentation with convolutional neural networks,” IEEE
Trans. Med. Imag., vol. 39, no. 2, pp. 499–513, Feb. 2020.
[52] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time
style transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis.,
2016, pp. 694–711.
[53] J. Shi and J. Malik, “Normalized cuts and image segmentation,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,
Aug. 2000.
[54] M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers,
“Normalized cut loss for weakly-supervised CNN segmentation,” in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 1818–1827.
[55] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected
CRFs with Gaussian edge potentials,” in Proc. Adv. Neural Inf. Process.
Syst., 2011, pp. 109–117.
[56] J. Ma, “Segmentation loss odyssey, 2020, arXiv:2005.13449. [Online].
Available: http://arxiv.org/abs/2005.13449
[57] Z.-H. Zhou, “A brief introduction to weakly supervised learning,” Nat.
Sci. Rev., vol. 5, no. 1, pp. 44–53, Jan. 2018.
[58] R. Adams and L. Bischof, “Seeded region growing,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 16, no. 6, pp. 641–647, Jun. 1994.
[59] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams,
J. Winn, and A. Zisserman, “The Pascal visual object classes challenge:
A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136,
Jan. 2015.
[60] K. Chen et al., “MMDetection: Open MMLab detection tool-
box and benchmark,” 2019, arXiv:1906.07155. [Online]. Available:
http://arxiv.org/abs/1906.07155
[61] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–41.
[62] X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-the-art,”
Knowl.-Based Syst., vol. 212, Jan. 2021, Art. no. 106622.
Mingchun Li received the B.S. and M.S. degrees in
automation from Northeastern University, Shenyang,
China, in 2015 and 2018, respectively, where he is
currently pursuing the Ph.D. degree with the College
of Information Science and Engineering.
His research lies at the intersection of machine
learning and image processing. His current research
work is about medical signals and industrial intelli-
gence based on deep learning.
Dali Chen received the B.S., M.S., and Ph.D.
degrees in automation, pattern recognition, and
intelligent systems from Northeastern University,
Shenyang, China, in 2003, 2005, and 2008, respec-
tively.
He is currently an Associate Professor with the
College of Information Science and Engineering,
Northeastern University. His research lies at the
intersection of machine learning and image process-
ing. His current research interest is to develop deep
learning algorithms for medical image processing
and industrial intelligent systems.
Shixin Liu (Member, IEEE) received the B.S.
degree in mechanical engineering from Southwest
Jiaotong University, Sichuan, China, in 1990, and
the M.S. and Ph.D. degrees in systems engineer-
ing from Northeastern University, Shenyang, China,
in 1993 and 2000, respectively.
He is currently a Professor with the College of
Information Science and Engineering, Northeastern
University. He has authored or coauthored over
100 publications, including one book. His research
interests are in intelligent optimization algorithms,
planning and scheduling, machine learning, and computer vision.
Fang Liu received the B.S. and Ph.D. degrees
in materials science from Northeastern University,
Shenyang, China, in 2004 and 2013, respectively.
She is currently a Lecturer with the School
of Materials Science and Engineering, Northeast-
ern University. Her current research interests are
wrought aluminum–silicon alloy and alloy design
based on finely dispersed second-phase particles
strengthening matrix.
Authorized licensed use limited to: Northeastern University. Downloaded on April 23,2023 at 07:06:40 UTC from IEEE Xplore. Restrictions apply.
... Gao et al. [8] used Mask R-CNN to segment crystals of L-glutamic acid to perform further automatic measurement. Li et al. [9] used proposed prior Mask R-CNN with loss function based on normalized graph cuts to improve instance segmentation on metallographic precipitations. ...
... Data augmentations are employed to prevent over-fitting for the improvement of generalization. Usually, geometric transformation is adopted in the application of Mask R-CNN [9]. ...
... To improve the generalization, various data augmentation methods are performed. For instance segmentation in the SEM images of crystals, Gao et al. directly used Mask R-CNN with transfer learning [8], whereas Li et al. carefully selected the image augmentation methods for special scene [9]. The latter adopted overlapping cutting but abandoned image rescaling and rotation. ...
Article
Scanning electron microscope (SEM) images of molecular sieve catalysts contain information about the shapes and sizes of these particles. SEM image measurement is a crucial step in the evaluation of the catalytic performances. Instance segmentation methods, such as Mask R-CNN are effective in automatically analyzing SEM images. However, their performance is limited in small datasets. Although overfitting caused by small data sets can be addressed through data augmentation at the image level, the application of Mask R-CNN still needs further improvement for generalization enhancement. In this paper, two techniques for Mask R-CNN are proposed to alleviate overfitting during the training on small datasets. First, merged sampling on the region proposal network simultaneously samples proposals with high and low scores in order that the head networks can be exposed to more diverse proposal samples. Second, random proposal expansion enhances the diversity of samples provided to the mask-branch of Mask R-CNN. These two techniques can be viewed as data augmentation at the instance level. Experiments on a small SEM image dataset showed that merged-sampling Mask R-CNN with random proposal expansion improved about the average precision (AP) by 5%, compared with the original Mask R-CNN. Overfitting on the small dataset is effectively controlled using the proposed methods. Hence, the results of the particle measurements of the industrial SEM images improved.
... After calculating the exact location of the object, we evaluated each pixel to determine whether it was the background or an object and then compared it with the real situation of the image to obtain the mask loss value. Finally, denotes the total loss function, as shown in Formula(2) [22]. ...
Article
Full-text available
This paper addresses the challenging task of determining the position and posture of small-scale thin metal parts with multi-objective overlapping. To tackle this problem, we propose a method that utilizes instance segmentation and a three-dimensional (3D) point cloud for recognizing the posture of thin special-shaped metal parts. We investigate the process of obtaining a single target point cloud by aligning the target mask with the depth map. Additionally, we explore a pose estimation method that involves registering the target point cloud with the model point cloud, designing a registration algorithm that combines the sample consensus initial alignment algorithm (SAC-IA) for coarse registration and the iterative closest point (ICP) algorithm for fine registration. The experimental results demonstrate the effectiveness of our approach. The average accuracy of the instance segmentation models, utilizing ResNet50 + FPN and ResNet101 + FPN as backbone networks, exceeds 97%. The time consumption of the ResNet50 + FPN model is reduced by 50%. Furthermore, the registration algorithm, which combines the SAC-IA and ICP, achieves a lower average consumption time while satisfying the requirements for the manufacturing of new energy batteries.
... Li et al. [78] proposed a novel instance segmentation framework called prior ...
Preprint
Full-text available
Finding quantitative descriptors representing the microstructural features of a given material is an ongoing research area in the paradigm of Materials-by-Design. Historically, microstructural analysis mostly relies on qualitative descriptions. However, to build a robust and accurate process-structure-properties relationship, which is required for designing new advanced high-performance materials, the extraction of quantitative and meaningful statistical data from the microstructural analysis is a critical step. In recent years, computer vision (CV) methods, especially those which are centered around convolutional neural network (CNN) algorithms have shown promising results for this purpose. This review paper focuses on the state-of-the-art CNN-based techniques that have been applied to various multi-scale microstructural image analysis tasks, including classification, object detection, segmentation, feature extraction, and reconstruction. Additionally, we identified the main challenges with regard to the application of these methods to materials science research. Finally, we discussed some possible future directions of research in this area. In particular, we emphasized the application of transformer-based models and their capabilities to improve the microstructural analysis of materials.
... where θ represents the parameters involved in network. In fact, it has achieved good results in semantic segmentation [35] and instance segmentation [48]. The difference between them lies in the weight matrix W c . ...
Article
Full-text available
In recent years, weakly supervised learning is a hot topic in the field of machine learning, especially for image segmentation. Assuming that only a small number of pixel categories are known in advance, it is worth thinking about how to achieve appropriate deep network. In this work, a series of weakly supervised segmentation losses based on graph cuts are proposed to solve this problem. Specifically, we take the objective function of the classical graph cut algorithm as the loss function of deep learning and integrate it into the back gradient propagation to update the parameters of network. Follow this route, typical region-based losses, such as IoU loss and Dice loss, could also be extended to weakly supervised versions in this work. Besides, considering the computational complexity of the pixel level graph cut algorithm, we use SLIC algorithm to extract superpixels as the vertices of the graph involved in our loss function. In the experiments, the network based on the proposed graph cut loss achieves good performance in VOC2012 dataset, which proves the effectiveness of our method.
... For the task considered in this article, that is, an application of microstructure of the metal, scholars also put forward many methods [2]- [5], [10]. Combined with specific characteristics of materials, multiple methods based on ultimate opening [2], graph cuts [3], mean shift [4], support vector machine [5], and deep network [10], [24] are proposed to solve respective tasks for iron, carbon steels, and aluminum. ...
Article
Full-text available
In the manufacturing process of aluminum alloy, the size, distribution, and shape of microscopic grains indicate the mechanical characteristics and product quality. However, for metallographic images that can reveal microstructures, the cost of expert labeling at pixel level is high. To solve the problem, we propose a semisupervised learning strategy for grain boundary detection with a few labeled images and abundant unlabeled samples. To expand the helpful information, transfer learning and rule-based region growing are considered. Specifically, a deep network used for extracting multiscale features is designed. With constant training, through a few labeled metallographic images and abundant transferred natural images, pseudo annotations are generated gradually for unlabeled metallographic images iteratively by feature similarity and boundary region growing. The increased unlabeled samples with their pseudo annotations would be involved in the following training process in semisupervised self-training mode to improve the generalization ability of model, together with the domain adaptation block. In experiments, the proposed two methods named semiricher convolutional features-generative adversarial networks (SemiRCF-GAN) and semiricher convolutional features-maximum mean discrepancy (SemiRCF-MMD) can effectively detect grain boundaries with only one labeled metallographic image, and achieve F1 scores of 0.73 and 0.72, respectively, which surpass typical methods.
Article
Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segmentation errors, such as incorrectly fused or separated segments, can significantly lower the accuracy of the measured results. Instead of improving the existing pixel-wise boundary segmentation methods, we propose to use an instance-based segmentation method, which gives more robust segmentation results to improve measurement accuracy. Our novel method enhances flow maps with a size-aware multi-scale attention module. The attention module adaptively fuses information from multiple scales and focuses on the most relevant scale for each segmented image area. We demonstrate that our proposed attention fusion strategy outperforms state-of-the-art instance and boundary segmentation methods, as well as simple average fusion of multi-scale predictions. We evaluate our method on a refractory raw material dataset of high-resolution images with large variation in crystal size and show that our model can be used to calculate the crystal size more accurately than existing methods.
Article
Grain size plays a fundamental role in the mechanical properties of materials. Recently, automatic measurement of average grain size attacks more and more attention based on computer vision. However, low contrast, twin grains, thin boundary, and low connectivity limit the achievement of automatic and accurate image analysis. Inspired by the calculation procedure of grain size, we propose a center-guided and connectivity-preserving network for grain boundaries segmentation. On the one hand, the proposed center feature extraction module and center-guided feature recalibration mechanism make the network pay more attention to the center area. On the other hand, a connectivity-preserving loss function is integrated with the network, which forces the network to converge towards high connectivity. Benefit by above aspects, our network can segment the grain boundary with high structural integrity and avoid the complex post-processing process. Experiments on the SRIF-GSM dataset reveal that our method achieves 85.98 mIoU and 95.60 clDice scores, demonstrating significant advantages compared with the state-of-the-art semantic segmentation methods.
Article
In the collection and transmission of power big data, the problem of data missing exists. In response to this problem, this paper proposes a power data detection and repair method based on SOM-LSTM. Firstly, a large amount of collected power data is analyzed and the type of missing data is determined. Then, the SOM is used to classify the power data. The LSTM is trained according to the characteristic values of different users to complete the detection and repair of different types of missing power data. Finally, the analysis is based on actual data in some regional loads of China. Experimental results show that, compared with the extreme learning machine (ELM) and LSTM, the proposed SOM-LSTM model reduces the mean absolute error (MAE) by 0.2498 and 0.3425, and the root mean square error (RMSE) by 0.1048 and 0.1469, respectively.
Article
Bounding boxes have been widely implemented into aerial object detection for its simplicity. They perform instance-level location with the coordinates and orientation for each target. But the defects such as coarse edge information impede semantic interpretation in earth observation. Besides, in terms of the aerial imaging instruments, it’s essential to recognize the exterior appearance and contour of the objects. In this paper, we propose a novel aerial instance segmentation method termed adaptive RoI extraction network (ARE-Net) which bridges the gap of accurately delineating instances under the complex back-ground of aerial images. To exert instance segmentation under the proprietary property, e.g., complex background, densely distributed instances, of aerial images, RoIs are pooled from multi-level feature maps and integral region proposals. On this basis, global attention RoI extractor (GA-RoIE) and perceptual RoI extractor (PRoIE) are respectively introduced for detection branch and mask branch to perform adaptive RoI extraction for aerial images. Meanwhile, to reconcile the probability distribution regional distribution of pixel-wise prediction in aerial images, we present the Adaptive Compound loss function to improve the integrating degree of the predicted binary mask to ground truth mask. Additionally, we adopt RegNetx with Deformable Convolution to optimize ARE-Net, and name it as R-ARE-Net. Despite implementing pixel-wise prediction, comprehensive experiments on iSAID and NWPU VHR-10 instance segmentation dataset still have verified the effectiveness and efficiency of ARE-Net and R-ARE-Net. Experimental results indicate that our proposed methods receive the highest AP value (38.0% AP on iSAID and 64.2% AP on NWPU VHR-10 instance segmentation dataset) and lowest FLOPs and Parameters consumption (~46% reduced FLOPs and 61.5% reduced Parameters than SCNet) among the mainstream methods. Besides, the false alarms, missing segmentations, poorly predicted masks, and under-segmentations that appeared in the mainstream methods can be avoided to some extend for R-ARE-Net.
Article
Full-text available
Widely used loss functions for CNN segmentation, e.g., Dice or cross-entropy, are based on integrals over the segmentation regions. Unfortunately, for highly unbalanced segmentations, such regional summations have values that differ by several orders of magnitude across classes, which affects training performance and stability. We propose a boundary loss, which takes the form of a distance metric on the space of contours, not regions. This can mitigate the difficulties of highly unbalanced problems because it uses integrals over the interface between regions instead of unbalanced integrals over the regions. Furthermore, a boundary loss complements regional information. Inspired by graph-based optimization techniques for computing active-contour flows, we express a non-symmetric L2 distance on the space of contours as a regional integral, which avoids completely local differential computations involving contour points. This yields a boundary loss expressed with the regional softmax probability outputs of the network, which can be easily combined with standard regional losses and implemented with any existing deep network architecture for N-D segmentation. We report comprehensive evaluations and comparisons on different unbalanced problems, showing that our boundary loss can yield significant increases in performances while improving training stability. Our code is publicly available¹.
Article
Full-text available
Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.
Article
Full-text available
To address the problem of weakly supervised instance segmentation for electrical equipment using RGB camera only, an automatic annotation of masks of samples (AAMS) method based on thermal image guidance is proposed in this paper. With image-level label supervision only, we exploit foreground segmentation results of thermal images to guide the instance mask extraction of electrical equipment in RGB images through the heterogeneous pixel registration algorithm between RGB-T image pairs. It is realized to automatically annotate instance masks, which greatly improves efficiency and decreases costs. In addition, we further propose a progressively optimized model (POM) for instance segmentation, which first utilizes the fully-connected conditional random field (CRF) and the constrain-to-boundary loss to specify fine-detailed boundaries of each object and to solve the difficulty of segmenting electrical equipment with complicated structures. This model also explores the self-paced learning technology to solve the issue of resolution differences between RGB-T image pairs for improving the generalization ability. By comparison to the other state-of-the-arts, experimental results show that our method can obtain by far the better performance on the electrical equipment dataset.
Article
Full-text available
Accurately segmenting contrast-filled vessels from X-ray coronary angiography (XCA) image sequence is an essential step for the diagnose and therapy of coronary artery disease. However, developing automatic vessel segmentation is particularly challenging due to the overlapping structures, low contrast and the presence of complex and dynamic background artifacts in XCA images. This paper develops a novel encoder–decoder deep network architecture which exploits the several contextual frames of 2D+t sequential images in a sliding window centered at current frame to segment 2D vessel masks from the current frame. The architecture is equipped with temporal-spatial feature extraction in encoder stage, feature fusion in skip connection layers and channel attention mechanism in decoder stage. In the encoder stage, a series of 3D convolutional layers are employed to hierarchically extract temporal-spatial features. Skip connection layers subsequently fuse the temporal-spatial feature maps and deliver them to the corresponding decoder stages. To efficiently discriminate vessel features from the complex and noisy backgrounds in the XCA images, the decoder stage effectively utilizes channel attention blocks to refine the intermediate feature maps from skip connection layers for subsequently decoding the refined features in 2D ways to produce the segmented vessel masks. Furthermore, Dice loss function is implemented to train the proposed deep network in order to tackle the class imbalance problem in the XCA data due to the wide distribution of complex background artifacts. Extensive experiments by comparing our method with other state-of-the-art algorithms demonstrate the proposed method’s superior performance over other methods in terms of the quantitative metrics and visual validation. To facilitate the reproductive research in XCA community, we publically release our dataset and source codes at https://github.com/Binjie-Qin/SVS-net.
Article
Full-text available
Detecting the pupil center plays a key role in human-computer interaction, especially for gaze tracking. The conventional deep learning-based method for this problem is to train a convolutional neural network (CNN), which takes the eye image as the input and gives the pupil center as a regression result. In this paper, we propose an indirect use of the CNN for the task, which first segments the pupil region by a CNN as a classification problem, and then finds the center of the segmented region. This is based on the observation that CNN works more robustly for the pupil segmentation than for the pupil center-point regression when the inputs are noisy IR images. Specifically, we use the UNet model for the segmentation of pupil regions in IR images and then find the pupil center as the center of mass of the segment. In designing the loss function for the segmentation, we propose a new loss term that encodes the convex shape-prior for enhancing the robustness to noise. Precisely, we penalize not only the deviation of each predicted pixel from the ground truth label but also the non-convex shape of pupils caused by the noise and reflection. For the training, we make a new dataset of 111,581 images with hand-labeled pupil regions from 29 IR eye video sequences.We also label commonly used datasets (ExCuSe and ElSe dataset) that are considered real-world noisy ones to validate our method. Experiments show that the proposed method performs better than the conventional methods that directly find the pupil center as a regression result.
Article
Full-text available
This paper established a microstructure-related hardness model of a polycrystalline Ni-based superalloy GH4720Li, and the sizes and area fractions of γ’ precipitates were extracted from scanning electron microscope (SEM) images using a deep learning method. The common method used to obtain morphological parameters of γ’ precipitates is the thresholding method. However, this method is not suitable for distinguishing different generations of γ’ precipitates with similar gray values in SEM images, which needs many manual interventions. In this paper, we employ SEM with ATLAS (AuTomated Large Area Scanning) module to automatically and quickly detect a much wider range of microstructures. A deep learning method of U-Net is firstly applied to automatically and accurately segment different generations of γ’ precipitates and extract their parameters from the large-area SEM images. Then the obtained sizes and area fractions of γ’ precipitates are used to study the precipitate stability and microstructure-related hardness of GH4720Li alloy at long-term service temperatures. The experimental results show that primary and secondary γ’ precipitates show good stability under long-term service temperatures. Tertiary γ’ precipitates coarsen selectively, and their coarsening behavior can be predicted by the Lifshitz–Slyozov encounter modified (LSEM) model. The hardness decreases as a result of γ’ coarsening. A microstructure-related hardness model for correlating the hardness of the γ’/γ coherent structures and the microstructure is established, which can effectively predict the hardness of the alloy with different microstructures.
Article
Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods-covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS)-with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms' performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.
Article
The size, shape and distribution of microstructures (second phase particles, grains) play an important role in the mechanical properties of alloy products. So, it is important to detect grains and second phase particles precisely. In this paper, we use multi-task learning and generative adversarial network (GAN) to realize the segmentation of the second phase and the boundary detection of grains at the same time. Specifically, a richer convolutional features (RCF) architecture based on multi-task learning is designed for preliminary detection and segmentation. Then, a generative adversarial network is employed to fine tune the hidden grain boundaries that covered by the second phase. Finally, a quantitative analysis module is designed to extract quantitative indicators according to the results of the two deep networks. We achieve 96.65% (accuracy), 0.8325 (IoU), 0.7824 (AJI) in the segmentation task and 92.65% (precision), 91.90% (recall) in the boundary detection task, which reach the state-of-the-art meanwhile.