Conference PaperPDF Available

Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching

February 1999

February 1999
5:3505 - 3510 vol.5

DOI:10.1109/IJCNN.1999.836231

Source
IEEE Xplore

Conference: Neural Networks, 1999. IJCNN '99. International Joint Conference on
Volume: 5

Authors:

Keisuke Kameyama

University of Tsukuba

Yukio Kosugi

Brain Function Labortory

An automatic defect classification (ADC) system for visual inspection of semiconductor wafers, using a neural network classifier is introduced The proposed hyperellipsoid clustering network (HCN) employing a radial basis function (RBF) in the hidden layer is trained with additional penalty conditions for recognizing unfamiliar inputs as originating from an unknown defect class. Also, by using a dynamic model alteration method called model switching, a reduced-model classifier which enables an efficient classification is obtained In the experiments, the effectiveness of the unfamiliar input recognition was confirmed, and a classification rate sufficiently high for use in the semiconductor fab was obtained

A defect found on a semiconductor wafer.

…

BP training with Model Switching.

…

Figures - uploaded by Keisuke Kameyama

Content may be subject to copyright.

Content uploaded by Keisuke Kameyama

Content may be subject to copyright.

Semiconductor Defect Classiﬁcation using Hyperellipsoid Clustering

Neural Networks and Model Switching

Keisuke Kameyama

Interdisciplinary Graduate School of Sci. and Eng.

Tokyo Institute of Technology

Yokohama 226-8502, Japan

Yukio Kosugi

Frontier Collaborative Research Center

Tokyo Institute of Technology

Yokohama 226-8503, Japan

Abstract

An automatic defect classiﬁcation (ADC) system for visual

inspection of semiconductor wafers, using a neural network

classiﬁer is introduced. The proposed Hyperellipsoid Clus-

tering Network (HCN) employing a Radial Basis Function

(RBF) in the hidden layer, is trained with additional penalty

conditions for recognizing unfamiliar inputs as originat-

ing from an unknown defect class. Also, by using a dy-

namic model alteration method called Model Switching, a

reduced-model classiﬁer which enables an efﬁcient classiﬁ-

cation is obtained. In the experiments, the effectiveness of

the unfamiliar input recognition was conﬁrmed, and a clas-

siﬁcation rate sufﬁciently high for use in the semiconductor

fab was obtained.

1. Introduction

Visual inspection plays an important role in the manufac-

turing processes of semiconductors. The disorders found

on the wafer surface, such as the one shown in Fig. 1, are

commonly referred to as defects. The motive for defect clas-

siﬁcation is to ﬁnd out the process stages and the sources

that are causing them. Early detection of the sources of de-

fects is essential in order to maintain high product yield and

quality. Fig. 1

By replacing the review process typically conducted by hu-

man experts, it is also aimed to improve both the stability

and speed of inspection. In the literature, it is reported that

the classiﬁcation accuracies of human experts are typically

60–80% [1]. If this stage of visual inspection could be auto-

mated, it will greatly contribute to enhance the productivity

of the semiconductor fab.

The task of classifying the defect image features has several

5µ



Figure 1. A defect found on a semiconductor

wafer.

speciﬁc conditions inherent to the particular problem. Most

distictive among them is the fact that the user does not have

the freedom of collecting a sufﬁcient number of, or an ap-

propriate selection of training images. Also, the number of

the training samples are extremely unbalanced.

When the number of samples for a defect class is small, ap-

proaches whose decisions rely on all samples, such as the

radial basis function (RBF) networks [10][12] or the joint

use of nonparametric estimation of the probability distri-

bution function by Parzen’s method [11] and Bayes clas-

siﬁcation, perform well. However, for a class with large

samples, these methods are computationally costly. In this

case, instead of using all the training samples for classiﬁ-

cation, methods based on distances from the class-cluster

prototypes such as the



nearest neighbor algorithm [2] and

learning vector quantization [9], and those based on class

borders such as multilayer perceptrons (MLP) [14] and sup-

port vector machines [15] are computationally more efﬁ-

cient. So-called reduced variants of the above nonparamet-

ric methods such as the generalized RBF networks [12] and

reduced Parzen classiﬁers [3] are also methods depending

on the distances from the prototypes.

In this work, a three-layered neural network named the

Hyperellipsoid Clustering Network (HCN), having hidden

layer units of RBF type will be used. In addition to the pa-

rameter adjustment by backpropagation (BP) method [14],

model alteration method called Model Switching (MS) [7]

which allows the map acquired by training to be inherited

to the new model, is used during the training process for

efﬁciently obtaining an appropriate reduced model.

The second requirement to the system is to classify the

known defect classes without fail and not to make wild

guesses against unfamiliar defects. Such cases should be

pointed out as unclassiﬁable and be left for the human ex-

pert to see. Since the training set will usually provide an-

swers at only a small portion of the feature space, inputs

to the remaining open space should be treated as being un-

known. For recognizing unfamiliar inputs, the HCN was

trained with additional penalty condition, so that the sizes

of the hyperellipsoid kernels will be kept small, to tightly

enclose the clusters formed by the training samples.

In Sec. 2, the HCN will be introduced, together with its

training method and the output interpretation method for

recognition of unfamiliar inputs. In Sec. 3, the idea of

Model Switching for allowing dynamic model alteration

during training will be reviewed. The defect classes and the

outline of the automatic defect classiﬁcation (ADC) system

will be explained in Sec. 4. In Sec. 5, the network and

the ADC system will be evaluated by applying to the clas-

siﬁcation of the defect image sets, and the paper will be

concluded in Sec. 6.

2. Hyperellipsoid clustering network (HCN)

The three-layered network model used for classifying the

feature vectors is illustrated in Fig. 2. The network has



inputs,



hidden units and



output units. The potential of

the



-th hidden layer unit is deﬁned as,

















(1)

with the following parameters to be adjusted in the training:

! #"

: radius parameter.



$&%'()(*(+%-,.0/1 #"

: center vector.



32 46587*9 :"<;$=>"

: weight matrix.

The transfer function of the hidden layer unit is the well-

known sigmoid function. Thus, the output of unit



is,

A@BCDFEHGJILKM6NO



BCQP-R

(

(2)

Input

vec

tor

Output

vec

tor

Input la

yer Hidden layer

(Hypere

llipsoid

discriminant + sigmoid)

Output layer

(Linear)

Parameters : (Hn

, mn, rn)

Connection

weigh

t : wk

[

]

Figure 2. The Hyperellipsoid Clustering Net-

work (HCN).

x1x2

1.0

h1h2

Figure 3. An example of the kernel functions

made by the joint use of (hyper) ellipsoid dis-

criminants and sigmoid functions.

A unit in the output layer takes the fan out of the hidden

layer units and calculates the weighted sum with no bias as,

`abdc

afe



(3)

where

cgaAhija-'k(*(*( ija

;

/ l";

is the weight vector

of the



-th output unit, and

m

'(*()(

;



A"<;

. The

weight vector is also modiﬁed in the training process.

By employing a discriminant in Eq. (1), the discrimina-

tion plane in the feature space will always be a hyperellip-

soid. Since the unit potential in Eq. (1) depends on the

distance between the input



and the center vector





the network is a RBF network. However, in contrast with

the popular Gaussian RBF network [12], various proﬁles

of the kernel function are possible by controlling the gain

[4] of the sigmoid function with the radius parameter



, as

shown in Fig. 3. This network model using the hyperel-

lipsoid discriminant and the sigmoid function in the hidden

layer, will be referred to as the Hyperellipsoid Clustering

Network (HCN).

The training method used in the HCN is based on the

batched BP law with momentum terms [14]. The error cri-

terion

is deﬁned as



rs



tuo

rs

wv

yx









(4)

with

z"|{

and

F"|{

denoting the cardi-

nality of the training set, the error for the

}

-th training pair,

the

}

-th training output vector and the

}

-th output vector,

respectively.

For enabling a “tight bounding by hyperellipsoids” to im-

plement the recognition of the unfamiliar inputs, the vol-

ume of the hyperellipsoids should be kept small as long

as it does not harm the achievement of training. This can

be done by setting some penalty term to restrict the radius

of the hyperellipsoids. The distance from the center to the

edge of the hyperellipsoid in the direction of the

-th princi-

pal component can be written as



w

 - B

, where



is the

-th eigenvalue of the matrix



f



, which is always pos-

itive. Thus, a penalty to suppress the absolute value of the

radius parameter

w

can be considered to be effective. Also,

a term to prevent the eigenvalues from becoming too small,

was necessary. This second restriction was implemented in-

directly by preventing the Euclidean norm of the matrix





from becoming too small. Consequently, the modiﬁcation

measures to the weight matrix





and the radius parameter



were formulated as,











-

I1>

















(5)

and



w



-

p





 









(6)

with the terms





-

and



u

denoting the modiﬁca-

tion measures by the plain BP training. Parameters



and



denote the penalty term gains.

The network will be trained to respond with a class speciﬁc

unit vector. Since the output is the weighted sum of the

kernel functions of the hidden layer units, it can be justiﬁed

to reject an output vector that does not have a signiﬁcant

winner. In such a case, the input pattern should be classiﬁed

to be originating from an unknown class. Therefore, the

output interpretation of,

6k|$

argmax

&`aH



`a

unknown



otherwise



(7)

will be used, with



being the membership threshold.

3. Model switching

As a method for obtaining a reduced network model in

the learning process, model alteration scheme called Model

Switching (MS) [7] is employed. MS is a framework for dy-

namic model alteration during the BP training for improve-

ment of the training efﬁciency, by avoiding the local minima

and reducing the redundancy in the network model.

Deﬁnition 1 (Model Switching) On altering the neural

network model, methods which determine the moment or

the occasion of model alteration, by taking into account

both the two factors in the following :

1. The nature and ﬁtness of the new model and the initial

map candidate within the new model.

2. The status of the immediate model and map.

will be referred to as Model Switching (MS).

In this work, MS will be used to reduce the number of hid-

den layer units in the HCN in which the training is initially

started with a model having the same number of units as

the training sample. Pruning algorithms [13] which is also

an attempt to reduce the network size, mostly limit the oc-

casion of model reduction to after the convergence of the

training error. With MS, however, the occasion can be set at

any time, as long as the ﬁtness of the candidate of the initial

map within the new model is met. When only the model

reduction is used in MS, only the ﬁrst factor in Def. 1 needs

to be considered.

The process of training by BP with MS is shown in Fig. 4.

For each training epoch of BP, the ﬁtness of the switchable

candidates will be evaluated, and switching will take place

when the ﬁtness

Q

of a candidate exceeds a given threshold

O

The candidate set of the new model and map was made by

using the unit reduction method of unit fusion [6]. Unit

fusion selects a pair of units in a layer and replaces them

by a single unit. On replacement, connection weights to the

new unit is determined so that the map of the old network

will be inherited by the new network.

Let us put that units indexed

and



will be fused to make a

single unit

~

. The weighted sum of the inputs from units



and the unity bias



to the subsequent layer unit



, can be

written as,

Ba  ija



ILija`QIija

 ija



¡



IL¢



I1ijaQH&¡£¤IL¢Qu¥Iijak



(8)

Sta

Modify parameter of the

immediate network fN by BP

Determine switchable

model-map ca

ndidate set

CMS

Evaluate Fitness Index

IF(fN , fN i )

for all fN i ∈ CMS

x{I

(



i)}

Switch fN → fNk

argm

{

No swi

tching

Trained ?

(E < E0)

Model size

reduction

(



i)}

Figure 4. BP training with Model Switching.

where

and

are the connection weight, unit re-

sponse, average unit response and the varying portion of

the response, respectively. Generally, we can put,

¢Q|·



¸



&



u-¹







(9)

with



and







denoting the standard deviation of the unit

output, and the output similarity of the unit pair, respec-

tively, both evaluated for all the training inputs. From Eqs.

8 and 9, we have

Ba ·



Eija



I1¸



&



u ¹





ijaQ-Pº`



ILijak

I|i aQ Eº¡ 



Q¸







u¹







(10)

implying that the connection weights should be changed as,



¼»

ija



IQ¸







º ¹





ija

(11)

and



aijaOIija-E¡£



¸



&



u







(12)

where the prime denote the connection weights afterthe fu-

sion.

Since no bias unit is used in the hidden layer of HCN, only

the compensation in Eq. 11 will be used. As unit fusion can

be applied to all unit pairs in the hidden layer,











G

switching candidates exist. The one which is most ﬁt will

be selected by evaluating the ﬁtness index



¾½

;



;





The ﬁtness of the new map will be a function of the degree

of map inheritance, and the closeness of the two kernels to

be fused in the feature space, to give priority to the fusion

of kernels that are placed close together. For evaluating the

degree of map inheritance, a measure named Map Distance

will be used.

Deﬁnition 2 (Map Distance) The map distance between

two mapping vector functions

;

À¿O

and

;



À¿O

trained

with the training vector set

EHÀ¿

rÂÁ

QP

rs

is deﬁned as,

¾½

;



;



f

rs



;

¿





;



¿











(13)

where

is the number of training pairs.

The ﬁtness of the candidates will be evaluated by the ﬁtness

index function of,



8½

;







;



 ÄÅ

ÅÇ

,CÉ

RÊ

Ë

Ì

ÉÍÎ

È£ÏjÐCÑ¾Ò

Ï¤Ó

½fÔÖÕ+½

Ë Ì

ÔJ×

ÏÐCÑ8Ò

@O&



Øu

`wÙ

¡ki

¡



(14)

where





;



and

ÃÛÚÝÜÞ

denote the map obtained by fus-

ing the

-th and



-th units, the dimension of feature space,

and the maximum possible map distance, respectively. It

is assumed that all the feature elements are bounded to the

Ø



G

domain. On actual evaluation of the map distance, the

theorem approximating the map distance generated by the

fusion of hidden layer units [7] was used.

4. The automatic defect classiﬁer system

(ADC) [8]

4.1. Defect classes

In this work, we will try to classify the physical defect

classes that provide most information for locating the cause

of the defects. The physical defect classes dealt with in this

work and their common appearances are listed in the fol-

lowing :

A. Foreign objects (FO)

This class includes defects such that external objects are

found on the wafer. Defects of FO class tend to appear as

small and dark colored regions, typically in near-circular

shape.

B. Embedded foreign objects (EO)

This is the class of defects where one or more processed ﬁlm

••

•

Defect mask

Shape feature

extraction

Color

quantization

Shape f

eature Color r

atio

HCN Cl

assifier

Defect

class

Reference

image Defect i

mage

••

•

Figure 5. The ﬂow of data in the ADC system.

layers have beenstacked over a foreign object. EO class de-

fects appear slightly larger and irregular-shaped than those

of the FO class, because the patterns of the heaped area in

the covering layers are deformed by the embedded object.

In addition to the characteristic dark color of the particle it-

self, other colors can be observed as well. Defects of FO

and EO classes can appear quite similar, and are sometimes

hard to distinguish even for an expert.

C. Pattern failure (PF)

This class covers all kinds of defects that have pattern de-

formations without any existence of external objects. De-

fects of PF class can also be caused by insufﬁcient exposure

or etching. Thus they can have a wide variety of size and

shape. Since the defect is usually an extra region or a lack

in the pattern of a layer, the color of the defect region tends

to be one of those observed in the normal patterns.

4.2. Feature extraction

A. Shape features

The ﬂow of data in the ADC system is shown in Fig. 5. Af-

ter subtraction of the defectless reference pattern from the

defect image and further graylevel thresholding, the defect

mask is made. From the defect mask, shape features of de-

fect size and roundness is calculated.

B. Color features

The color of the defect region is characterized by quantiz-

ing the color of each pixel to one of the prototype colors.

The prototype colors are determined in beforehand by ap-

plying the Median Cut Algorithm [5] to the defectless im-

(a) (b)

Figure 6. An artiﬁcially generated cluster data

of four classes. (a) Training set (P = 100). (b)

Test set (P= 1000).

ages of the layer to be inspected. Also, typical defect colors

are manually added as prototype colors. The ratios of the

quantized colors in the defect region were used as the color

feature vector of the defect.

In the experiments in Sec. 5, the feature dimension was

12, including the 2 shape features and 10 color features, all

normalized to unity range.

5. Experiment

A. Membership thresholding in an artiﬁcial cluster data

The effect of membership thresholding and MS was evalu-

ated using an artiﬁcial four-class data in a 2D domain shown

in Fig. 6. Three types of networks and training strategies

were tried. All networks were trained to the target error of

n

Ø(+ØG

, to respond with class speciﬁc unit vectors.

1. MLP with (input-hidden-output)=(2-4-4) units.

2. HCN with (input-hidden-o utput)=(2-100-4) units.

3. HCN trained by BP with MS for model reduction dur-

ing training. Initial model : (2-100-4).

The change in the recognition rate for the test set, and the

ratio of the area within the input domain which was pointed

out as being of unknown class, was evaluated by changing

the membership threshold



in Eq. 7. Ideally, the recogni-

tion rate will be maintained high, even when a large portion

of the input domain is judged as unknown (rejected). The re-

sult is shown in Fig. 7. It is clear that by reducing the model

of HCN by MS, larger portion of input domain is properly

rejected without losing the classiﬁcation ability for the test

set.

0.7

0.75

0.8

0.85

0.9

0.95

0 0.1 0.2 0.3 0.4 0.5 0.6

θ = 0.2

θ = 0.9

HCN with

Recognitio

n rate

Ratio of rejected inp

ut domain

Figure 7. The change in the recognition rate

and the ratio of the rejected input domain,

when the membership threshold is changed.

Table 1. The classiﬁcation rate and the con-

fusion matrix for the HCN evaluated by the

leave-one-out method. The numbers in bold

typeface are for the cases when membership

thresholding was used.

PFEO

Tru

eEstim

ation Correc

t (%)

32 1

022

97.0

88.9

83.3

91.7

87.5

Unkn

own

Error (%)

3.0

0.0

11.1

2.8

8.3

0.0

Foreign Object

(FO)

Embedded Object

(EO)

Pattern Failure

(PF)

Average rate

s (weighted) 92.5

89.2 7.5

1.1

B. Leave-one-out evaluation with HCN using MS

A collection of defect images obtained from the same pro-

cess layer of a product was used for evaluating the ADC

system. The set consisted of 33 FO class, 36 EO class

and 24 PF class images. The class information for all the

images were provided by an expert inspector. The classi-

ﬁcation rates were evaluated by the leave-one-out method

[3]. A HCN network with unit conﬁguration of (12-93-

3), initialized by placing each kernels at the training inputs

were trained using MS. The model typically converged to

reduced models with 9 to 14 hidden layer units.

The results are shown in Table 1. By employing the mem-

bership thresholding with

ôØ(+õ

, it is found that the non-

diagonal elements (errors) in the confusion matrix could be

reduced drastically. The obtained classiﬁcation rate is con-

sidered to be comparable to those of human experts. By re-

ducing the network model by MS, the computation required

for using the network was also reduced by 85–90%, when

compared with the initial network model.

6. Conclusion

An ADC system for visual inspection of semiconductor

wafers, using a neural network classiﬁer was introduced.

The Hyperellipsoid Clustering Network was introduced,

and the training rule with cost terms for recognizing unfa-

miliar inputs as originating from an unknown defect class

was given. Further, by using BP training with Model

Switching, a reduced-model classiﬁer which enables an efﬁ-

cient classiﬁcation was obtained. The defect classes and the

descriptions of the extracted image features was deﬁned. In

the experiments, the effectiveness of the unfamiliar input

recognition was conﬁrmed, and a classiﬁcation rate compa-

rable to those of human experts were obtained.

References

[1] P. B. Chou, A. R. Rao, M. C. Struzenbecker, F. Y. Wu, and

V. H. Brecher. Automatic defect classiﬁcation for semicon-

ductor manufacturing. Machine Vision and Applications,

9(4):201–214, 1997.

[2] R. O. Duda and P. E. Hart. Pattern Classiﬁcation and Scene

Analysis. Wiley, 1973.

[3] K. Fukunaga. Introduction to Statistical Pattern Recogni-

tion. Academic Press, 1990.

[4] R. Hecht-Nielsen. Neurocomputing. Addison-Wesley, 1990.

[5] P. Heckbert. Color image quantization for frame buffer dis-

play. Computer Graphics, 16(3):297–307, 1982.

[6] K. Kameyama and Y. Kosugi. Neural network pruning

by fusing hidden layer units. Transactions of IEICE,

E74(12):4198–4204, 1991.

[7] K. Kameyama and Y. Kosugi. Model switching by chan-

nel fusion for network pruning and efﬁcient feature extrac-

tion. Proceedings of International Joint Conference on Neu-

ral Networks 1998, pages 1861–1866, 1998.

[8] K. Kameyama, Y. Kosugi, T. Okahashi, and M. Izumita. Au-

tomatic defect classiﬁcation in visual inspection of semicon-

ductors using neural networks. IEICE Transactions on In-

formation and Systems, E81-D(11):1261–1271, 1998.

[9] T. Kohonen. Self-organization and associative memory.

Springer, 1988.

[10] J. E. Moody and C. J. Darken. Fast learning in networks of

locally-tuned processing units. Neural Computation, 1:281–

294, 1989.

[11] E. Parzen. On estimation of a probability density function

and mode. Annals of Mathematical Statistics, 33:1065–

1076, 1962.

[12] T. Poggio and F. Girosi. Networks for approximation and

learning. Proceedings of the IEEE, 78:1481–1497, 1990.

[13] R. Reed. Pruning algorithms a survey. IEEE Trans. Neural

Networks, 4(5):740–747, 1993.

[14] D. Rumelhart, J. L. McClelland, and the PDP Research

Group. Parallel distributed processing. MIT Press, 1986.

[15] V. N. Vapnik. Statistical Learning Theory. Wiley, 1999.

GATED WAVEFRONT SENSING

Article

A project report submitted in partial fulfilment of the requirements for the award of the degree of Bachelor (Hons.

Defect cluster recognition system for fabricated semiconductor wafers

Article

Mar 2013
ENG APPL ARTIF INTEL

The International Technology Roadmap for Semiconductors (ITRS) identifies production test data as an essential element in improving design and technology in the manufacturing process feedback loop. One of the observations made from the high-volume production test data is that dies that fail due to a systematic failure have a tendency to form certain unique patterns that manifest as defect clusters at the wafer level. Identifying and categorising such clusters is a crucial step towards manufacturing yield improvement and implementation of real-time statistical process control. Addressing the semiconductor industry’s needs, this research proposes an automatic defect cluster recognition system for semiconductor wafers that achieves up to 95% accuracy (depending on the product type).

Development of super-resolution optical inspection system for semiconductor defects using standing wave illumination shift

Article

Nov 2006
Proceedings of SPIE

Semiconductor design rules and process windows continue to shrink, so we face many challenges in developing new processes such as 300mm wafer, copper line and low-k dielectrics. The challenges have become more difficult because we must solve problems on patterned and un-patterned wafers. The problems include physical defects, electrical defects, and even macro defects, which can ruin an entire wafer rather than just a die. The optics and electron beam have been mainly used for detecting of the critical defects, but both technologies have disadvantages. The optical inspection is generally not enough sensitive for defects at 100nm geometries and below, while the SEM inspection has low throughput because it takes long time in preparing a vacuum and scanning 300mm. In order to find a solution to these problems, we propose the novel optical inspecting method for the critical defects on the semiconductor wafer. It is expected that the inspection system's resolution exceed the Rayleigh limit by the method. Additionally the method is optical one, so we can expect to develop high throughput inspection system. In the research, we developed the experimental equipment for the super-resolution optical inspection system. The system includes standing wave illumination shift with the piezoelectric actuator, dark-field imaging and super-resolution-post-processing of images. And then, as the fundamental verification of the super-resolution method, we performed basic experiments for scattered light detection from standard particles.

Neural network applications in automated optical inspection: State of the arts

Article

Nov 2002
Proceedings of SPIE

Optical inspection techniques have been widely adopted in industrial areas since they provide fast and accurate information on product quality, process status, and machine conditions. The technologies include sensing using vision, laser scattering and imaging, x-ray imaging, and other optical sensing, and data processing for classification and recognition problems. Frequently, data processing tasks are very difficult, which is mainly due to the large volume, the complexity, and the noise of the raw data acquired. Artificial neural networks have been proven to be an effective means to cope with the problems difficult to solve or inefficient to solve by convectional methodologies. This paper presents the applications of neural networks in optical inspection tasks. Among the variety of industrial areas, this paper focuses on the inspection tasks involved in printed circuit board manufacturing processes and semiconductor manufacturing processes, which are the most competing industries in the world today. In this paper, the inspection problems are addressed and the optical techniques together with neural networks to solve such problems are reviewed. The application cases to which neural networks are applied are also presented with their effects.

Automatic Defect Classification Using Semi-Supervised Learning With Defect Localization

Article

Aug 2023

Automatic defect classification (ADC) systems automatically classify defects that inevitably occur during semiconductor manufacturing processes. ADC is the beginning of defect management that increases the yield of semiconductor chip production, and prevents accidents in the process. It takes a lot of engineer’s labor to classify defects, but ADC can be the answer to classify all defects at low cost. ADC employs the defect image of a wafer surface, captured using scanning electron microscopy (SEM). SEM images can feature a variety of backgrounds based on the defect position on the wafer and the process steps. The manual classification of SEM images involves significant labor costs related to hiring experienced engineers. Despite recent ADC studies reporting good performance, the lack of labeled images and various backgrounds make it difficult to apply ADC in actual manufacturing processes. To address this issue, automated defect classification with defect localization is proposed herein. To this end, a classification model is specifically designed for reducing the effect of varying backgrounds using defect localization. Defect localization uses an object detection model to provide the region information of defects in SEM images. We aimed to design a classification model and defect detection model using semi-supervised learning to reduce labeling costs. Experimental results indicate that the classification performance, over 15 classes, is improved by 12.56% (9.82%p), as compared with that of supervised models.

Die-Level Defects Classification using Region-based Convolutional Neural Network

Conference Paper

Aug 2022

A generalised uncertain decision tree for defect classification of multiple wafer maps

Article

Jul 2019

Classification of defect chip patterns is one of the most important tasks in semiconductor manufacturing process. During the final stage of the process just before release, engineers must manually classify and summarise information of defect chips from a number of wafers that can aid in diagnosing the root causes of failures. Traditionally, several learning algorithms have been developed to classify defect patterns on wafer maps. However, most of them focused on a single wafer bin map based on certain features. The objective of this study is to propose a novel approach to classify defect patterns on multiple wafer maps based on uncertain features. To classify distinct defect patterns described by uncertain features on multiple wafer maps, we propose a generalised uncertain decision tree model considering correlations between uncertain features. In addition, we propose an approach to extract uncertain features of multiple wafer maps from the critical fail bit test (FBT) map, defect shape, and location based on a spatial autocorrelation method. Experiments were conducted using real-life DRAM wafers provided by the semiconductor industry. Results show that the proposed approach is much better than any existing methods reported in the literature.

Defect Inspection for Curved Surface with Highly Specular Reflection

Article

Sep 2015

Highly specular reflection (HSR) curved surfaces and their inspection in most manufacturing processes mainly depends on human inspectors whose performance is generally subjective, variable, and therefore inadequate. An automatic vision inspection system offers objectivity, better reliability, and repeatability and is able to carry out defect measurement to evaluate the industrial part’s quality. Thus, it is vital to develop an automatic vision system to monitor surface quality online. The main purpose of this chapter is to introduce a new defect inspection method capable of detecting defects on HSR curved surfaces, in particular, to create a complete vision inspection system for HSR curved surfaces (e.g., chrome-plated surfaces) . In the first part of this chapter, reflection analysis of HSR curved surface is performed. And a new method is introduced to measure reflection properties of our inspection object. Then, a method is introduced to avoid the loss of defects and solve these challenges which result from various defects and complex surface topography on HSR curved surface. A set of images are captured under different illumination directions. A synthetic image is reconstructed from the set of images. The synthetic image appears more uniform intensity compared with the original image because those specular areas have been completely removed. Furthermore, all defects are integrated in the synthetic image. In particular, for more complicate curved surface, an improvement method is proposed and experiments also validate the method. Finally, a complete vision defect inspection system has been created. The lighting system with side and diffuse illumination is selected for our inspection system and it succeeds in reducing the specular reflection from a curved surface, although some brightness appears at the edge. System parameters and object pose are determined by comparing defect expressivity and specular ratio in the image. Moreover, all defects can be quickly extracted by combining template matching and morphology techniques. The presented automatic vision defect inspection system has been implemented and tested on a number of simulation images and actual parts consist of HSR curved surfaces.

Research and Development of Intelligent On-Line Real-time Defect Inspection System for Polymer Polarizer

Article

Feb 2009

This paper aimed at developing a set of on-line real-time defect inspection system for Polymer polarizer. It intended to help producers make different improvements for the spot or line defects in manufacturing on-line polarizer. This paper used line-scan charged couple device (CCD) to capture the defect images of polarizer: firstly, to scale down the captured images by 64 times with downsampling compression; then, we utilized Laplace operator to find out the edges of defects and the shield with isotropic results for 45° and 90° incremental rotation. Subsequently, we used the statistical decision method of threshold value to divide up the edges of defects that were discovered with Laplace operator from the image. Eventually, we differentiated the spot defects, like dust, foreign objects, scars of hit and air bubbles in the production process from the line defects, like scratches in the follow-up handling process with the straight line detection characteristic of Hough Transform. This paper captured 200 pieces of defect images of polarizer samples in all and then input them to the inspection system developed by ourselves for test. The differentiation rate of defects amounted to 98%, which proved that we had successfully developed a set of on-line real time automatic optical inspection (AOI) system suitable to be applied to polymer polarizer.

Automatic Defect Classification Using Frequency and Spatial Features in a Boosting Scheme

Article

Jun 2009

An automatic defect classification algorithm is proposed in a boosting manner. The proposed method exploits the histogram of spatial orientation and frequency features. Specifically, the spatial gradient orientations of defect image are accumulated to be a histogram, and they are trained by SVM to construct a classifier. The frequency features are the projection of 2D Haar patterns on the frequency responses. The classifiers using these spatial and frequency features are combined in a boosting manner to improve the classification performance. According to the experiments with 100 training and testing sets, the proposed boosting method improves the classification performance compared with the previous works using optical features such as colors, shapes, and sizes of defects.

Neural network pruning by fusing hidden layer units

Article

Full-text available

Jan 1991

A General Framework for Parallel Distributed Processing

Article

Full-text available

Jan 1986

Automatic defect classification in visual inspection of semiconductors using neural networks

Article

Full-text available

Jan 1998

An automatic defect classification system (ADC) for use in visual inspection of semiconductor wafers is introduced. The methods of extracting the defect features based on the human experts' knowledge, with their correlations with the defect classes are elucidated. As for the classifier, Hyperellipsoid Clustering Network (HCN) which is a layered network model employing second order discrimination borders in the feature space, is introduced. In the experiments using a collection of defect images, the HCNs are compared with the conventional multilayer perceptron networks. There, it is shown that the HCN's adaptive hyperellipsoidal discrimination borders are more suited for the problem. Also, the cluster encapsulation by the hyperellipsoidal border enables to determine rejection classes, which is also desirable when the system will be in actual use. The HCN with rejection achieves, an overall classification rate of 75% with an error rate of 18%, which can be considered equivalent to those of the human experts.

Color image quantization for frame buffer display

Article

Jan 1980
Comput Graph

Paul S. Heckbert

Pattern Classification and Scene Analysis

Article

Jan 1973

Statistical Learning Theory

Article

Jan 1998

Vapnik VN

The pdp research group (eds

Article

Fast Learning in Networks of Locally-Tuned Processing Units

Article

Jun 1989

We propose a network architecture which uses a single internal layer of locally-tuned processing units to learn both classification tasks and real-valued function approximations (Moody and Darken 1988). We consider training such networks in a completely supervised manner, but abandon this approach in favor of a more computationally efficient hybrid learning method which combines self-organized and supervised learning. Our networks learn faster than backpropagation for two reasons: the local representations ensure that only a few units respond to any given input, thus reducing computational overhead, and the hybrid learning rules are linear rather than nonlinear, thus leading to faster convergence. Unlike many existing methods for data analysis, our network architecture and learning rules are truly adaptive and are thus appropriate for real-time use.

Inroduction to Statistical Pattern Recognition

Chapter

Jan 1990

K.~Fukunaga

On Estimation of Probability Density Function and Mode

Article

Sep 1962
Ann Math Stat

Emanuel Parzen

Semiconductor defect classification using hyperellipsoid clustering neural networks and model switching

Abstract and Figures

Recommended publications

Semiconductor Defect Classification using Hyperellipsoid Clustering Neural Networks and Model Switch...

Model switching by channel fusion for network pruning and efficient feature extraction

Neural network model switching for efficient feature extraction

Model Switching by Channel Fusion for Network Pruning and Efficient Feature Extraction