Figure 3 - uploaded by Anwar Irmatov
Content may be subject to copyright.
Face detection algorithm processing diagram 

Face detection algorithm processing diagram 

Source publication
Article
Full-text available
A new algorithm based on coarse-to-fine searching technique for detecting faces on a digital image, is proposed. Its testing results on public image databases prove its high accuracy. Efficient implementation of the method for a DSP is reported.

Context in source publication

Context 1
... threshold and a binary image is constructed. At the next step Circular Hough Transform is calculated on the obtained binary image. It results in accumulator array which cells have values being proportional to the number of the binary image pixels lying on circles, which centers are determined by cells coordinates. Then analysis of the accumulator is performed: cells with values above the predefined threshold are found and a list of corresponded circle centers is constructed. In practice a set of neighboring pixels of the binary image can generate several arcs with similar parameters. Actually all of them determine the same circle, so these arcs can be integrated. The integration is fulfilled by means of clustering of the found circle centers. K-means clustering technique is applied at this step. The centers of the circles, calculated by the clustering procedure, determine possible head location positions. The described algorithm detects circles of fixed radius. For detecting circles of different radiuses the same scheme is applied to the image pyramid. Examples of head detection algorithm results are shown in Figure 2. Results of head detection algorithm are processed at the second stage of the face detection algorithm. Accurate face localization is performed by means of cascade of neural networks with SNOW (Sparse net of winnows) architecture [24]. Cascade is a sequence of classifiers, constructed on the base of SNOW neural networks [6]. An image fragment, to be classified, is processed by all cascade stages sequentially. If the current stage classifies the fragment as “non-face” the further processing of this fragment is dropped. Thus the cascade is a degenerate decision tree. Evaluation of every following classifier is triggered by a positive decision of the previous one. The structure of the cascade is inspired by the fact that within any image an overwhelming majority of sub- windows belong to non-face class. And only a few of them contain face. Therefore cascade stages are constructed in such a way that there are maximum face fragments left after processing by each of them, and as many non-faces as possible are rejected. It should be also taken into account, that simpler classifiers are evaluated much faster and it is expedient to include them at the first stages of the cascade. A classifier based on SNOW neural network is operated at every cascade stage. Complexity and efficiency of such classifier depend on input layer size and consequently on size of the processed image. Let the cascade process images of fixed size. In order to fit image size to input layer of a corresponded neural network an image has to be scaled before processing by every cascade classifier. Thus the input fragment is scaled with some ratio and passed to the classifier #1. If response of the classifier is negative then the processing is stopped, and the input image is classified as “non-face”. Otherwise the image is scaled with another ratio and the result is processed by the classifier #2. In a similar manner the classification result either triggers the next cascade stage or drops further processing of the entire cascade. If responses from all cascade classifiers are positive then the input image is referred to “face” class. The whole detection process is based on sliding window technique. A region of interest (ROI) of an image is scanned by a window of fixed size. The described cascade is applied to fragments related to every window position and classifies them into “face” and “non-face” classes. A cascade consisting of 4 stages was developed for the proposed face detection algorithm. A ROI of the image is scanned with the window of size 24x24 pixels. Fragments related to window positions are classified by the cascade. Parameters of the stages are listed in Table 1. Sizes of fragments, processed by every cascade stage, are divisors of 24. It contributes to fast implementation of classification process. It should be noticed, that size of the maximum fragment, processed by the cascade stages, is 12x12 pixels, i.e. the whole source fragment is not needed for correct classification. Training of the neural networks containing in the cascade was fulfilled on a set formed from several image databases. From 5 up to 10 face fragments were generated for every image. The fragments are produced by randomly rotating the images up to 10 degrees with scaling between 85% and 115%. Every face fragment was cropped and scaled down so that its size was 24x24 pixels. There were totally more than 100000 face samples in the training set. A set of negative samples was constructed from non- face fragments randomly selected in images without faces. Although it is difficult to collect a representative set of non-face examples, the bootstrap method [25] was used to include more non-face examples during training process. The training set did not contain fragments of images from test sets. The scheme of the face detection algorithm is shown in Figure 3. At the first stage an image pyramid is constructed for detecting faces of various sizes. Then face coarse detection step is fulfilled for every image from the pyramid by means of the head detection algorithm. The next stage consists of preparation of possible head positions for scanning with the cascade of neural networks. Because of mistakes occurred at head localization step and slight variations of face area position relative to head center for different people, rectangular regions around possible head positions are constructed. Combining the regions located around different head positions forms a scanning mask for the currently processed image. The neural network cascade scans only positions under this mask. There can be a lot of fragments classified as faces after all images from the pyramid have been processed. The reason is that classifiers belonging to the cascade are insensitive to slight changes in face translation and scale. It results in multiple detections which occur around each face in a scanned image. On the other hand false detections are often single. In practice it often makes sense to return one final detection per face in an image. Besides, analysis of cascade positive responses lying closely to each other adds information for making final decision about face location in an image. These circumstances induce development of the algorithm for integration of multiple detections in the image. This algorithm consists of two stages: 1. Fulfill clustering of the cascade positive responses. 2. Process the resulted clusters by comparing number of responses included in every obtained cluster and cluster weight with thresholds. If both values are greater than the thresholds, then this cluster yields a single final detection. Multiple detections integration procedure results in a list of faces detected on the image. Application of the proposed face detection algorithm in the access control system described above conditioned selection of testing image databases. The algorithm evaluation was fulfilled on public image datasets Caltech [26] and BioID [27]. Caltech dataset contains 450 images of frontal faces. Image size is 896x592 pixels. Images are captured under uncontrolled illumination and with complex background. BioID database is a set of 1521 frontal-view and 384x286 resolution images. The properties of this database are different illumination and complex background. Several experiments were fulfilled in order to evaluate efficiency of the proposed algorithm. The first test is devoted to the head detection algorithm analysis. During the processing of every image from the testing databases squares of scanning masks, which are the results of head detection stage, are calculated and stored. Then these values are compared with the sum of squares of all images from the images pyramid, which are scanned by the detector. Table 2 shows how many times the area scanned by the cascade is decreased, if the head detection stage is applied. There are values for two testing datasets in this table. Average processing times for images from each of the testing databases are presented in Table 3. These tests were fulfilled on PC with Core2Quard 2.85GHz processor and 3.25Gb of RAM. The last testing is devoted to the algorithm detection accuracy. Table 4 lists the resulted values of correct detection rate and number of false detections for each of the testing datasets. The best of the previously published results obtained on the BioID database, which is known to us, is 93% of correct detections and no false detections [8]. Hence the proposed algorithm outperforms that one by ...

Similar publications

Article
Full-text available
This paper proposes a method for skin region segmentation for face images with unbalanced color and light based on preprocessing of a multicolor face image sequence. In this method, multicolor face images from a sequence are preprocessed in order to remove color offset and uneven brightness. Then the color data from the preprocessed face images are...