Conference PaperPDF Available

Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching

Authors:

Abstract and Figures

An automatic defect classification (ADC) system for visual inspection of semiconductor wafers, using a neural network classifier is introduced The proposed hyperellipsoid clustering network (HCN) employing a radial basis function (RBF) in the hidden layer is trained with additional penalty conditions for recognizing unfamiliar inputs as originating from an unknown defect class. Also, by using a dynamic model alteration method called model switching, a reduced-model classifier which enables an efficient classification is obtained In the experiments, the effectiveness of the unfamiliar input recognition was confirmed, and a classification rate sufficiently high for use in the semiconductor fab was obtained
Content may be subject to copyright.
Semiconductor Defect Classification using Hyperellipsoid Clustering
Neural Networks and Model Switching
Keisuke Kameyama
Interdisciplinary Graduate School of Sci. and Eng.
Tokyo Institute of Technology
Yokohama 226-8502, Japan
Yukio Kosugi
Frontier Collaborative Research Center
Tokyo Institute of Technology
Yokohama 226-8503, Japan
Abstract
An automatic defect classification (ADC) system for visual
inspection of semiconductor wafers, using a neural network
classifier is introduced. The proposed Hyperellipsoid Clus-
tering Network (HCN) employing a Radial Basis Function
(RBF) in the hidden layer, is trained with additional penalty
conditions for recognizing unfamiliar inputs as originat-
ing from an unknown defect class. Also, by using a dy-
namic model alteration method called Model Switching, a
reduced-model classifier which enables an efficient classifi-
cation is obtained. In the experiments, the effectiveness of
the unfamiliar input recognition was confirmed, and a clas-
sification rate sufficiently high for use in the semiconductor
fab was obtained.
1. Introduction
Visual inspection plays an important role in the manufac-
turing processes of semiconductors. The disorders found
on the wafer surface, such as the one shown in Fig. 1, are
commonly referred to as defects. The motive for defect clas-
sification is to find out the process stages and the sources
that are causing them. Early detection of the sources of de-
fects is essential in order to maintain high product yield and
quality. Fig. 1
By replacing the review process typically conducted by hu-
man experts, it is also aimed to improve both the stability
and speed of inspection. In the literature, it is reported that
the classification accuracies of human experts are typically
60–80% [1]. If this stage of visual inspection could be auto-
mated, it will greatly contribute to enhance the productivity
of the semiconductor fab.
The task of classifying the defect image features has several
5µ
m
Figure 1. A defect found on a semiconductor
wafer.
specific conditions inherent to the particular problem. Most
distictive among them is the fact that the user does not have
the freedom of collecting a sufficient number of, or an ap-
propriate selection of training images. Also, the number of
the training samples are extremely unbalanced.
When the number of samples for a defect class is small, ap-
proaches whose decisions rely on all samples, such as the
radial basis function (RBF) networks [10][12] or the joint
use of nonparametric estimation of the probability distri-
bution function by Parzen’s method [11] and Bayes clas-
sification, perform well. However, for a class with large
samples, these methods are computationally costly. In this
case, instead of using all the training samples for classifi-
cation, methods based on distances from the class-cluster
prototypes such as the
nearest neighbor algorithm [2] and
learning vector quantization [9], and those based on class
borders such as multilayer perceptrons (MLP) [14] and sup-
port vector machines [15] are computationally more effi-
cient. So-called reduced variants of the above nonparamet-
ric methods such as the generalized RBF networks [12] and
reduced Parzen classifiers [3] are also methods depending
on the distances from the prototypes.
In this work, a three-layered neural network named the
Hyperellipsoid Clustering Network (HCN), having hidden
layer units of RBF type will be used. In addition to the pa-
rameter adjustment by backpropagation (BP) method [14],
model alteration method called Model Switching (MS) [7]
which allows the map acquired by training to be inherited
to the new model, is used during the training process for
efficiently obtaining an appropriate reduced model.
The second requirement to the system is to classify the
known defect classes without fail and not to make wild
guesses against unfamiliar defects. Such cases should be
pointed out as unclassifiable and be left for the human ex-
pert to see. Since the training set will usually provide an-
swers at only a small portion of the feature space, inputs
to the remaining open space should be treated as being un-
known. For recognizing unfamiliar inputs, the HCN was
trained with additional penalty condition, so that the sizes
of the hyperellipsoid kernels will be kept small, to tightly
enclose the clusters formed by the training samples.
In Sec. 2, the HCN will be introduced, together with its
training method and the output interpretation method for
recognition of unfamiliar inputs. In Sec. 3, the idea of
Model Switching for allowing dynamic model alteration
during training will be reviewed. The defect classes and the
outline of the automatic defect classification (ADC) system
will be explained in Sec. 4. In Sec. 5, the network and
the ADC system will be evaluated by applying to the clas-
sification of the defect image sets, and the paper will be
concluded in Sec. 6.
2. Hyperellipsoid clustering network (HCN)
The three-layered network model used for classifying the
feature vectors is illustrated in Fig. 2. The network has
inputs,
hidden units and
output units. The potential of
the
-th hidden layer unit is defined as,






(1)
with the following parameters to be adjusted in the training:
! #"
: radius parameter.
$&%'()(*(+%-,.0/1 #"
,
: center vector.
32 46587*9 :"<;$=>"
,
: weight matrix.
The transfer function of the hidden layer unit is the well-
known sigmoid function. Thus, the output of unit
is,
?
A@BCDFEHGJILKM6NO
BCQP-R
'
(
(2)
Input
vec
S
tor
x
Output
vec
S
tor
y
Input la
T
yer Hidden layer
(Hypere
U
llipsoid
discriminant + sigmoid)
Output layer
(Linear)
Parameters : (Hn
V
, mn, rn)
Σ
W
Σ
W
Σ
W
Connection
weigh
X
t : wk
1
Y
l
Z
L
[
1
Y
n
\
N
]
1
Y
k
^
O
_
Figure 2. The Hyperellipsoid Clustering Net-
work (HCN).
x1x2
1.0
0
h1h2
Figure 3. An example of the kernel functions
made by the joint use of (hyper) ellipsoid dis-
criminants and sigmoid functions.
A unit in the output layer takes the fan out of the hidden
layer units and calculates the weighted sum with no bias as,
`abdc
/
afe
(3)
where
cgaAhija-'k(*(*( ija
;
/ l";
is the weight vector
of the
-th output unit, and
e
m
?
'(*()(
?
;
/
A"<;
. The
weight vector is also modified in the training process.
By employing a discriminant in Eq. (1), the discrimina-
tion plane in the feature space will always be a hyperellip-
soid. Since the unit potential in Eq. (1) depends on the
distance between the input
and the center vector
,
the network is a RBF network. However, in contrast with
the popular Gaussian RBF network [12], various profiles
of the kernel function are possible by controlling the gain
[4] of the sigmoid function with the radius parameter
, as
shown in Fig. 3. This network model using the hyperel-
lipsoid discriminant and the sigmoid function in the hidden
layer, will be referred to as the Hyperellipsoid Clustering
Network (HCN).
The training method used in the HCN is based on the
batched BP law with momentum terms [14]. The error cri-
terion
n
is defined as
n
G
o
p
q
rs
'
n
r
G
tuo
p
q
rs
'
wv
r
yx
r
(4)
with
o
,
n
r
,
v
r
z"|{
and
x
r
F"|{
denoting the cardi-
nality of the training set, the error for the
}
-th training pair,
the
}
-th training output vector and the
}
-th output vector,
respectively.
For enabling a “tight bounding by hyperellipsoids” to im-
plement the recognition of the unfamiliar inputs, the vol-
ume of the hyperellipsoids should be kept small as long
as it does not harm the achievement of training. This can
be done by setting some penalty term to restrict the radius
of the hyperellipsoids. The distance from the center to the
edge of the hyperellipsoid in the direction of the
~
-th princi-
pal component can be written as
w
- B
, where

is the
~
-th eigenvalue of the matrix
/
f
, which is always pos-
itive. Thus, a penalty to suppress the absolute value of the
radius parameter
w
can be considered to be effective. Also,
a term to prevent the eigenvalues from becoming too small,
was necessary. This second restriction was implemented in-
directly by preventing the Euclidean norm of the matrix
from becoming too small. Consequently, the modification
measures to the weight matrix
and the radius parameter

were formulated as,
-
p
I1>

(5)
and
w
-
p


(6)
with the terms
-
p
and
u
p
denoting the modifica-
tion measures by the plain BP training. Parameters

and
denote the penalty term gains.
The network will be trained to respond with a class specific
unit vector. Since the output is the weighted sum of the
kernel functions of the hidden layer units, it can be justified
to reject an output vector that does not have a significant
winner. In such a case, the input pattern should be classified
to be originating from an unknown class. Therefore, the
output interpretation of,
6k|$
argmax
a
&`aH
if
`a
unknown
otherwise
(7)
will be used, with
being the membership threshold.
3. Model switching
As a method for obtaining a reduced network model in
the learning process, model alteration scheme called Model
Switching (MS) [7] is employed. MS is a framework for dy-
namic model alteration during the BP training for improve-
ment of the training efficiency, by avoiding the local minima
and reducing the redundancy in the network model.
Definition 1 (Model Switching) On altering the neural
network model, methods which determine the moment or
the occasion of model alteration, by taking into account
both the two factors in the following :
1. The nature and fitness of the new model and the initial
map candidate within the new model.
2. The status of the immediate model and map.
will be referred to as Model Switching (MS).
In this work, MS will be used to reduce the number of hid-
den layer units in the HCN in which the training is initially
started with a model having the same number of units as
the training sample. Pruning algorithms [13] which is also
an attempt to reduce the network size, mostly limit the oc-
casion of model reduction to after the convergence of the
training error. With MS, however, the occasion can be set at
any time, as long as the fitness of the candidate of the initial
map within the new model is met. When only the model
reduction is used in MS, only the first factor in Def. 1 needs
to be considered.
The process of training by BP with MS is shown in Fig. 4.
For each training epoch of BP, the fitness of the switchable
candidates will be evaluated, and switching will take place
when the fitness
Q
of a candidate exceeds a given threshold
O
.
The candidate set of the new model and map was made by
using the unit reduction method of unit fusion [6]. Unit
fusion selects a pair of units in a layer and replaces them
by a single unit. On replacement, connection weights to the
new unit is determined so that the map of the old network
will be inherited by the new network.
Let us put that units indexed
~
and
will be fused to make a
single unit
~
. The weighted sum of the inputs from units
~
,
and the unity bias
to the subsequent layer unit
, can be
written as,
Ba ija
`
ILija`QIija
ija
¡
IL¢
I1ijaQH&¡£¤IL¢Qu¥Iijak
(8)
Sta
¦
rt
Modify parameter of the
immediate network fN by BP
En
§
d
Determine switchable
model-map ca
¨
ndidate set
CMS
Evaluate Fitness Index
IF(fN , fN i )
for all fN i CMS
ma
©
x{I
ª
F
«
(
¬
f
N
®
,f
N
®
i)}
¯
>
°
I
ª
F
«
0
N
Y
Switch fN fNk
k
±
=
²
argm
³
ax
i
´
{
µ
No swi
tching
Trained ?
(E < E0)
N
Y
Model size
reduction
I
ª
F
«
(
¬
f
N
®
,f
N
®
i)}
¯
Figure 4. BP training with Model Switching.
where
i
,
`
,
¡
and
¢
are the connection weight, unit re-
sponse, average unit response and the varying portion of
the response, respectively. Generally, we can put,
¢Q|·
¸
&
u-¹
¹
¢
(9)
with
¹
and
denoting the standard deviation of the unit
output, and the output similarity of the unit pair, respec-
tively, both evaluated for all the training inputs. From Eqs.
8 and 9, we have
Ba ·
Eija
I1¸
&
u ¹
¹
ijaQ-Pº`
ILijak
I|i aQ Eº¡
Q¸

u¹
¹
¡
P
(10)
implying that the connection weights should be changed as,
i
a
¼»
ija
IQ¸

º ¹
¹
ija
(11)
and
i
aijaOIija-E¡£
¸
&
u
¹
¹
¡
P
(12)
where the prime denote the connection weights afterthe fu-
sion.
Since no bias unit is used in the hidden layer of HCN, only
the compensation in Eq. 11 will be used. As unit fusion can
be applied to all unit pairs in the hidden layer,
'
G
switching candidates exist. The one which is most fit will
be selected by evaluating the fitness index
¾½
;
½
;
.
The fitness of the new map will be a function of the degree
of map inheritance, and the closeness of the two kernels to
be fused in the feature space, to give priority to the fusion
of kernels that are placed close together. For evaluating the
degree of map inheritance, a measure named Map Distance
will be used.
Definition 2 (Map Distance) The map distance between
two mapping vector functions
½
;
'
À¿O
and
½
;
À¿O
trained
with the training vector set
EHÀ¿
rÂÁ
r
QP
p
rs
'
is defined as,
Ã
¾½
;
'
½
;
f
G
o
p
q
rs
'
½
;
'
¿
r
½
;
¿
r
(13)
where
o
is the number of training pairs.
The fitness of the candidates will be evaluated by the fitness
index function of,
8½
;
½
;
ÄÅ
Æ
ÅÇ
È
,CÉ
RÊ
Ë
R
Ì
Ê
ÉÍÎ
È£ÏjÐCѾÒ
R
ϤÓ
½fÔÖÕ+½
Ë Ì
ÔJ×
Î
,
É
ÏÐCÑ8Ò
~
@O&
Øu
0
`wÙ
?
¡ki
~
¡
(14)
where
½
;
,
and
ÃÛÚÝÜÞ
denote the map obtained by fus-
ing the
~
-th and
-th units, the dimension of feature space,
and the maximum possible map distance, respectively. It
is assumed that all the feature elements are bounded to the
Ø
G
domain. On actual evaluation of the map distance, the
theorem approximating the map distance generated by the
fusion of hidden layer units [7] was used.
4. The automatic defect classifier system
(ADC) [8]
4.1. Defect classes
In this work, we will try to classify the physical defect
classes that provide most information for locating the cause
of the defects. The physical defect classes dealt with in this
work and their common appearances are listed in the fol-
lowing :
A. Foreign objects (FO)
This class includes defects such that external objects are
found on the wafer. Defects of FO class tend to appear as
small and dark colored regions, typically in near-circular
shape.
B. Embedded foreign objects (EO)
This is the class of defects where one or more processed film
AN
ß
D
••
à
Defect mask
á
Shape feature
extraction
Color
quantization
Shape f
â
eature Color r
ã
atio
HCN Cl
ä
assifier
Defect
å
class
Reference
æ
image Defect i
ç
mage
••
à
Figure 5. The flow of data in the ADC system.
layers have beenstacked over a foreign object. EO class de-
fects appear slightly larger and irregular-shaped than those
of the FO class, because the patterns of the heaped area in
the covering layers are deformed by the embedded object.
In addition to the characteristic dark color of the particle it-
self, other colors can be observed as well. Defects of FO
and EO classes can appear quite similar, and are sometimes
hard to distinguish even for an expert.
C. Pattern failure (PF)
This class covers all kinds of defects that have pattern de-
formations without any existence of external objects. De-
fects of PF class can also be caused by insufficient exposure
or etching. Thus they can have a wide variety of size and
shape. Since the defect is usually an extra region or a lack
in the pattern of a layer, the color of the defect region tends
to be one of those observed in the normal patterns.
4.2. Feature extraction
A. Shape features
The flow of data in the ADC system is shown in Fig. 5. Af-
ter subtraction of the defectless reference pattern from the
defect image and further graylevel thresholding, the defect
mask is made. From the defect mask, shape features of de-
fect size and roundness is calculated.
B. Color features
The color of the defect region is characterized by quantiz-
ing the color of each pixel to one of the prototype colors.
The prototype colors are determined in beforehand by ap-
plying the Median Cut Algorithm [5] to the defectless im-
x1
0
è
1
é
1
é
x2
ê
(a) (b)
x1
0
è
1
é
1
é
x2
Figure 6. An artificially generated cluster data
of four classes. (a) Training set (P = 100). (b)
Test set (P= 1000).
ages of the layer to be inspected. Also, typical defect colors
are manually added as prototype colors. The ratios of the
quantized colors in the defect region were used as the color
feature vector of the defect.
In the experiments in Sec. 5, the feature dimension was
12, including the 2 shape features and 10 color features, all
normalized to unity range.
5. Experiment
A. Membership thresholding in an artificial cluster data
The effect of membership thresholding and MS was evalu-
ated using an artificial four-class data in a 2D domain shown
in Fig. 6. Three types of networks and training strategies
were tried. All networks were trained to the target error of
n
Ø(+ØG
, to respond with class specific unit vectors.
1. MLP with (input-hidden-output)=(2-4-4) units.
2. HCN with (input-hidden-o utput)=(2-100-4) units.
3. HCN trained by BP with MS for model reduction dur-
ing training. Initial model : (2-100-4).
The change in the recognition rate for the test set, and the
ratio of the area within the input domain which was pointed
out as being of unknown class, was evaluated by changing
the membership threshold
in Eq. 7. Ideally, the recogni-
tion rate will be maintained high, even when a large portion
of the input domain is judged as unknown (rejected). The re-
sult is shown in Fig. 7. It is clear that by reducing the model
of HCN by MS, larger portion of input domain is properly
rejected without losing the classification ability for the test
set.
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.1 0.2 0.3 0.4 0.5 0.6
θ = 0.2
θ = 0.9
ML
ë
P
HC
ì
N
HCN with
í
MS
Recognitio
î
n rate
Ratio of rejected inp
ï
ut domain
Figure 7. The change in the recognition rate
and the ratio of the rejected input domain,
when the membership threshold is changed.
Table 1. The classification rate and the con-
fusion matrix for the HCN evaluated by the
leave-one-out method. The numbers in bold
typeface are for the cases when membership
thresholding was used.
FO
ð
PFEO
ð
Tru
ñ
eEstim
³
ation Correc
ò
t (%)
32
32 1
00
0
2
022
21
0
0
2
12
0
32
30
97.0
97.0
88.9
83.3
91.7
87.5
Unkn
ó
own
0
1
0
3
0
5
Error (%)
3.0
0.0
11.1
2.8
8.3
0.0
Foreign Object
(FO)
Embedded Object
(EO)
Pattern Failure
(PF)
Average rate
U
s (weighted) 92.5
89.2 7.5
1.1
B. Leave-one-out evaluation with HCN using MS
A collection of defect images obtained from the same pro-
cess layer of a product was used for evaluating the ADC
system. The set consisted of 33 FO class, 36 EO class
and 24 PF class images. The class information for all the
images were provided by an expert inspector. The classi-
fication rates were evaluated by the leave-one-out method
[3]. A HCN network with unit configuration of (12-93-
3), initialized by placing each kernels at the training inputs
were trained using MS. The model typically converged to
reduced models with 9 to 14 hidden layer units.
The results are shown in Table 1. By employing the mem-
bership thresholding with
ôØ(+õ
, it is found that the non-
diagonal elements (errors) in the confusion matrix could be
reduced drastically. The obtained classification rate is con-
sidered to be comparable to those of human experts. By re-
ducing the network model by MS, the computation required
for using the network was also reduced by 85–90%, when
compared with the initial network model.
6. Conclusion
An ADC system for visual inspection of semiconductor
wafers, using a neural network classifier was introduced.
The Hyperellipsoid Clustering Network was introduced,
and the training rule with cost terms for recognizing unfa-
miliar inputs as originating from an unknown defect class
was given. Further, by using BP training with Model
Switching, a reduced-model classifier which enables an effi-
cient classification was obtained. The defect classes and the
descriptions of the extracted image features was defined. In
the experiments, the effectiveness of the unfamiliar input
recognition was confirmed, and a classification rate compa-
rable to those of human experts were obtained.
References
[1] P. B. Chou, A. R. Rao, M. C. Struzenbecker, F. Y. Wu, and
V. H. Brecher. Automatic defect classification for semicon-
ductor manufacturing. Machine Vision and Applications,
9(4):201–214, 1997.
[2] R. O. Duda and P. E. Hart. Pattern Classification and Scene
Analysis. Wiley, 1973.
[3] K. Fukunaga. Introduction to Statistical Pattern Recogni-
tion. Academic Press, 1990.
[4] R. Hecht-Nielsen. Neurocomputing. Addison-Wesley, 1990.
[5] P. Heckbert. Color image quantization for frame buffer dis-
play. Computer Graphics, 16(3):297–307, 1982.
[6] K. Kameyama and Y. Kosugi. Neural network pruning
by fusing hidden layer units. Transactions of IEICE,
E74(12):4198–4204, 1991.
[7] K. Kameyama and Y. Kosugi. Model switching by chan-
nel fusion for network pruning and efficient feature extrac-
tion. Proceedings of International Joint Conference on Neu-
ral Networks 1998, pages 1861–1866, 1998.
[8] K. Kameyama, Y. Kosugi, T. Okahashi, and M. Izumita. Au-
tomatic defect classification in visual inspection of semicon-
ductors using neural networks. IEICE Transactions on In-
formation and Systems, E81-D(11):1261–1271, 1998.
[9] T. Kohonen. Self-organization and associative memory.
Springer, 1988.
[10] J. E. Moody and C. J. Darken. Fast learning in networks of
locally-tuned processing units. Neural Computation, 1:281–
294, 1989.
[11] E. Parzen. On estimation of a probability density function
and mode. Annals of Mathematical Statistics, 33:1065–
1076, 1962.
[12] T. Poggio and F. Girosi. Networks for approximation and
learning. Proceedings of the IEEE, 78:1481–1497, 1990.
[13] R. Reed. Pruning algorithms a survey. IEEE Trans. Neural
Networks, 4(5):740–747, 1993.
[14] D. Rumelhart, J. L. McClelland, and the PDP Research
Group. Parallel distributed processing. MIT Press, 1986.
[15] V. N. Vapnik. Statistical Learning Theory. Wiley, 1999.
... On the other hand, SSA has no relation with the defect data generated by local anomalies caused during wafer mounting, dicing, imbedded particle contamination, etc. (Shankar & Zhong, 2004) Automated techniques that have been developed to perform wafer inspection task through the image processing techniques Several promising well-known techniques such as digital holography (Dai, Hunt, & Schulze, 2003), digital shearography (Udupa, Ngoi, Goh, & Yusoff, 2004)and semiconductor neural networks system have been developed and reported by several research groups. (Kameyama & Kosugi, 2002) Rule-based inspections of semiconductor wafer surface have been reported (Shankar & Zhong, 2003) Defect detection usually is performed by directly comparing the two complex wavefronts taken from corresponding fields of view from adjacent die on the wafer. ...
Article
A project report submitted in partial fulfilment of the requirements for the award of the degree of Bachelor (Hons.
... For the application under discussion, this translates to getting sufficient amount of high quality historical production data for training a classifier to account for the issues highlighted above such as variations in defect cluster size, shape, orientation and location with different manufacturing yield and noise levels. Unfortunately, even though semiconductor devices are manufactured in high volumes, it was observed that there is normally a lack of sufficient number or appropriate selections of good quality training samples available in the historical production test data-logs to obtain a stable statistical inference for a chosen classifier (Kameyama & Kosugi, 1999). ...
Article
The International Technology Roadmap for Semiconductors (ITRS) identifies production test data as an essential element in improving design and technology in the manufacturing process feedback loop. One of the observations made from the high-volume production test data is that dies that fail due to a systematic failure have a tendency to form certain unique patterns that manifest as defect clusters at the wafer level. Identifying and categorising such clusters is a crucial step towards manufacturing yield improvement and implementation of real-time statistical process control. Addressing the semiconductor industry’s needs, this research proposes an automatic defect cluster recognition system for semiconductor wafers that achieves up to 95% accuracy (depending on the product type).
... These defects deteriorate electrical chip performance and process yield in factory line. Defect inspection also includes defect classification (categorizing) [3], defect size metrology and defect area identification in order to eliminate nuisance defects. Especially, the patterned wafer appearance inspection is one of the key-requirements for the next generation. ...
Article
Semiconductor design rules and process windows continue to shrink, so we face many challenges in developing new processes such as 300mm wafer, copper line and low-k dielectrics. The challenges have become more difficult because we must solve problems on patterned and un-patterned wafers. The problems include physical defects, electrical defects, and even macro defects, which can ruin an entire wafer rather than just a die. The optics and electron beam have been mainly used for detecting of the critical defects, but both technologies have disadvantages. The optical inspection is generally not enough sensitive for defects at 100nm geometries and below, while the SEM inspection has low throughput because it takes long time in preparing a vacuum and scanning 300mm. In order to find a solution to these problems, we propose the novel optical inspecting method for the critical defects on the semiconductor wafer. It is expected that the inspection system's resolution exceed the Rayleigh limit by the method. Additionally the method is optical one, so we can expect to develop high throughput inspection system. In the research, we developed the experimental equipment for the super-resolution optical inspection system. The system includes standing wave illumination shift with the piezoelectric actuator, dark-field imaging and super-resolution-post-processing of images. And then, as the fundamental verification of the super-resolution method, we performed basic experiments for scattered light detection from standard particles.
... Thus the automated recognition algorithm is highly desirable. To recognize automatically the defect patterns on wafers, Chen and Liu [20], Sikka [21], and Kameyama and Kosugi [22] adopted several types of neural networks. ...
Article
Optical inspection techniques have been widely adopted in industrial areas since they provide fast and accurate information on product quality, process status, and machine conditions. The technologies include sensing using vision, laser scattering and imaging, x-ray imaging, and other optical sensing, and data processing for classification and recognition problems. Frequently, data processing tasks are very difficult, which is mainly due to the large volume, the complexity, and the noise of the raw data acquired. Artificial neural networks have been proven to be an effective means to cope with the problems difficult to solve or inefficient to solve by convectional methodologies. This paper presents the applications of neural networks in optical inspection tasks. Among the variety of industrial areas, this paper focuses on the inspection tasks involved in printed circuit board manufacturing processes and semiconductor manufacturing processes, which are the most competing industries in the world today. In this paper, the inspection problems are addressed and the optical techniques together with neural networks to solve such problems are reviewed. The application cases to which neural networks are applied are also presented with their effects.
Article
Automatic defect classification (ADC) systems automatically classify defects that inevitably occur during semiconductor manufacturing processes. ADC is the beginning of defect management that increases the yield of semiconductor chip production, and prevents accidents in the process. It takes a lot of engineer’s labor to classify defects, but ADC can be the answer to classify all defects at low cost. ADC employs the defect image of a wafer surface, captured using scanning electron microscopy (SEM). SEM images can feature a variety of backgrounds based on the defect position on the wafer and the process steps. The manual classification of SEM images involves significant labor costs related to hiring experienced engineers. Despite recent ADC studies reporting good performance, the lack of labeled images and various backgrounds make it difficult to apply ADC in actual manufacturing processes. To address this issue, automated defect classification with defect localization is proposed herein. To this end, a classification model is specifically designed for reducing the effect of varying backgrounds using defect localization. Defect localization uses an object detection model to provide the region information of defects in SEM images. We aimed to design a classification model and defect detection model using semi-supervised learning to reduce labeling costs. Experimental results indicate that the classification performance, over 15 classes, is improved by 12.56% (9.82%p), as compared with that of supervised models.
Article
Classification of defect chip patterns is one of the most important tasks in semiconductor manufacturing process. During the final stage of the process just before release, engineers must manually classify and summarise information of defect chips from a number of wafers that can aid in diagnosing the root causes of failures. Traditionally, several learning algorithms have been developed to classify defect patterns on wafer maps. However, most of them focused on a single wafer bin map based on certain features. The objective of this study is to propose a novel approach to classify defect patterns on multiple wafer maps based on uncertain features. To classify distinct defect patterns described by uncertain features on multiple wafer maps, we propose a generalised uncertain decision tree model considering correlations between uncertain features. In addition, we propose an approach to extract uncertain features of multiple wafer maps from the critical fail bit test (FBT) map, defect shape, and location based on a spatial autocorrelation method. Experiments were conducted using real-life DRAM wafers provided by the semiconductor industry. Results show that the proposed approach is much better than any existing methods reported in the literature.
Article
Highly specular reflection (HSR) curved surfaces and their inspection in most manufacturing processes mainly depends on human inspectors whose performance is generally subjective, variable, and therefore inadequate. An automatic vision inspection system offers objectivity, better reliability, and repeatability and is able to carry out defect measurement to evaluate the industrial part’s quality. Thus, it is vital to develop an automatic vision system to monitor surface quality online. The main purpose of this chapter is to introduce a new defect inspection method capable of detecting defects on HSR curved surfaces, in particular, to create a complete vision inspection system for HSR curved surfaces (e.g., chrome-plated surfaces) . In the first part of this chapter, reflection analysis of HSR curved surface is performed. And a new method is introduced to measure reflection properties of our inspection object. Then, a method is introduced to avoid the loss of defects and solve these challenges which result from various defects and complex surface topography on HSR curved surface. A set of images are captured under different illumination directions. A synthetic image is reconstructed from the set of images. The synthetic image appears more uniform intensity compared with the original image because those specular areas have been completely removed. Furthermore, all defects are integrated in the synthetic image. In particular, for more complicate curved surface, an improvement method is proposed and experiments also validate the method. Finally, a complete vision defect inspection system has been created. The lighting system with side and diffuse illumination is selected for our inspection system and it succeeds in reducing the specular reflection from a curved surface, although some brightness appears at the edge. System parameters and object pose are determined by comparing defect expressivity and specular ratio in the image. Moreover, all defects can be quickly extracted by combining template matching and morphology techniques. The presented automatic vision defect inspection system has been implemented and tested on a number of simulation images and actual parts consist of HSR curved surfaces.
Article
This paper aimed at developing a set of on-line real-time defect inspection system for Polymer polarizer. It intended to help producers make different improvements for the spot or line defects in manufacturing on-line polarizer. This paper used line-scan charged couple device (CCD) to capture the defect images of polarizer: firstly, to scale down the captured images by 64 times with downsampling compression; then, we utilized Laplace operator to find out the edges of defects and the shield with isotropic results for 45° and 90° incremental rotation. Subsequently, we used the statistical decision method of threshold value to divide up the edges of defects that were discovered with Laplace operator from the image. Eventually, we differentiated the spot defects, like dust, foreign objects, scars of hit and air bubbles in the production process from the line defects, like scratches in the follow-up handling process with the straight line detection characteristic of Hough Transform. This paper captured 200 pieces of defect images of polarizer samples in all and then input them to the inspection system developed by ourselves for test. The differentiation rate of defects amounted to 98%, which proved that we had successfully developed a set of on-line real time automatic optical inspection (AOI) system suitable to be applied to polymer polarizer.
Article
An automatic defect classification algorithm is proposed in a boosting manner. The proposed method exploits the histogram of spatial orientation and frequency features. Specifically, the spatial gradient orientations of defect image are accumulated to be a histogram, and they are trained by SVM to construct a classifier. The frequency features are the projection of 2D Haar patterns on the frequency responses. The classifiers using these spatial and frequency features are combined in a boosting manner to improve the classification performance. According to the experiments with 100 training and testing sets, the proposed boosting method improves the classification performance compared with the previous works using optical features such as colors, shapes, and sizes of defects.
Article
Full-text available
An automatic defect classification system (ADC) for use in visual inspection of semiconductor wafers is introduced. The methods of extracting the defect features based on the human experts' knowledge, with their correlations with the defect classes are elucidated. As for the classifier, Hyperellipsoid Clustering Network (HCN) which is a layered network model employing second order discrimination borders in the feature space, is introduced. In the experiments using a collection of defect images, the HCNs are compared with the conventional multilayer perceptron networks. There, it is shown that the HCN's adaptive hyperellipsoidal discrimination borders are more suited for the problem. Also, the cluster encapsulation by the hyperellipsoidal border enables to determine rejection classes, which is also desirable when the system will be in actual use. The HCN with rejection achieves, an overall classification rate of 75% with an error rate of 18%, which can be considered equivalent to those of the human experts.
Article
We propose a network architecture which uses a single internal layer of locally-tuned processing units to learn both classification tasks and real-valued function approximations (Moody and Darken 1988). We consider training such networks in a completely supervised manner, but abandon this approach in favor of a more computationally efficient hybrid learning method which combines self-organized and supervised learning. Our networks learn faster than backpropagation for two reasons: the local representations ensure that only a few units respond to any given input, thus reducing computational overhead, and the hybrid learning rules are linear rather than nonlinear, thus leading to faster convergence. Unlike many existing methods for data analysis, our network architecture and learning rules are truly adaptive and are thus appropriate for real-time use.