Fig 1 - uploaded by Seba Susan
Content may be subject to copyright.
Original Image Step 2: Transform the image into CIEL*a*b* color space and obtain the a* (red-green axis) and b* (blue-yellow axis) color dimensions. Binarize the a* and b* matrices using Otsu's threshold The largest connected component in the intersection of a* and b* represents the face. Since a* values are high for reddish hues and b* is high for yellowish shades, the skin is segmented out with high accuracy as shown in Fig. 2 (a).

Original Image Step 2: Transform the image into CIEL*a*b* color space and obtain the a* (red-green axis) and b* (blue-yellow axis) color dimensions. Binarize the a* and b* matrices using Otsu's threshold The largest connected component in the intersection of a* and b* represents the face. Since a* values are high for reddish hues and b* is high for yellowish shades, the skin is segmented out with high accuracy as shown in Fig. 2 (a).

Source publication
Conference Paper
Full-text available
Most of the face recognition problems in literature rely on automatic eye-pair detection algorithms for locating the eye positions followed by the normalization of the face image based on the distance between the eyes. In this paper we propose a supervised fuzzy eye pair detection algorithm that can be executed in real time and requires minimal tra...

Contexts in source publication

Context 1
... Membership of the distance between eyes , x1, x2, x3, x4, x5, x6, x7, x8 are mean values of the respective features and σ1, σ2, σ3, σ4, σ5, σ6, σ7, σ8 are their standard deviations. These values are obtained from the mean and standard deviations of the measurements made from the manually annotated eye pairs in the ten reference images from the database selected randomly and subjected to the binarization procedure as explained above. ...
Context 2
... Membership of the distance between eyes , x1, x2, x3, x4, x5, x6, x7, x8 are mean values of the respective features and σ1, σ2, σ3, σ4, σ5, σ6, σ7, σ8 are their standard deviations. These values are obtained from the mean and standard deviations of the measurements made from the manually annotated eye pairs in the ten reference images from the database selected randomly and subjected to the binarization procedure as explained above. ...
Context 3
... Membership of the distance between eyes , x1, x2, x3, x4, x5, x6, x7, x8 are mean values of the respective features and σ1, σ2, σ3, σ4, σ5, σ6, σ7, σ8 are their standard deviations. These values are obtained from the mean and standard deviations of the measurements made from the manually annotated eye pairs in the ten reference images from the database selected randomly and subjected to the binarization procedure as explained above. ...
Context 4
... Membership of the distance between eyes , x1, x2, x3, x4, x5, x6, x7, x8 are mean values of the respective features and σ1, σ2, σ3, σ4, σ5, σ6, σ7, σ8 are their standard deviations. These values are obtained from the mean and standard deviations of the measurements made from the manually annotated eye pairs in the ten reference images from the database selected randomly and subjected to the binarization procedure as explained above. ...

Citations

... This face-centering process with focused attention resulting from eye-schema detection would be analogous in process to the algorithms employed for face-pattern matching in artificial intelligence programs (cf. Seba and Kadyan 2013;Campadelli et al. 2009). Despite habituation to previously viewed models, detection of the discrepancy between these Fig. 7 Average elapsed time for 10 opercular beats for the model-1 group A and model-2 group B. Means and standarderror values are shown. ...
Article
Full-text available
Previous research has shown that African jewel fish (Hemichromis bimaculatus) recognize pair-bonded mates during their exchanges of egg-guarding duties. The current research examined the perceptual cues for face recognition by comparing two face models displaying anatomically realistic arrangements of blue iridophores derived from discriminant function analysis of distinct sibling groups. Four groups each consisting of 9 subadults were examined using a narrow compartment restraining lateral movement where face models were presented at eye level for eight trials. Because respiratory movement of the operculum can mechanically displace the eye thereby shifting the retinal image, jewel fish reduce their respiration rate during increased attention. When two experimental groups were presented with the same face models on four trials following initial model presentations, both groups exhibited stable respiration rates indicative of model habituation. When the habituated face models were switched to novel face models on the fifth trial, the rates of respiration decreased as measured by reliable increases in the elapsed times of opercular beats. Switching the models back to the habituated models on the sixth trial caused reliable decreases in the elapsed times of opercular beats, resembling the earlier trials for the habituated models. Switching the face models again to the formerly novel models on the seventh trial produced respiration rates that resembled those of the habituated models. The two control groups viewing the same models for all eight trials exhibited no substantial change in respiration rates. Together, these findings indicate that jewel fish can learn to recognize novel faces displaying unique arrangements of iridorphores after one trial of exposure.
... The matching results obtained by comparing parts of the image is more useful than comparing the image as a whole, as observed in the facial expression recognition experiment in [20] wherein the facial image was divided into grids. In most practices, the parts that are learnt pertain to specific parts of objects like car wheels [29], cat heads and human bodies and faces [30] and specific facial features like the eye pair [31]. In our case, for numeral images that are static over time, we assume that semantic interpretations are related to the top-half, bottom-half, left-half and right-half parts of the numeral image. ...
Article
Full-text available
A novel sub-part learning scheme is introduced in our work for the purpose of recognizing handwritten numeral images. The idea is borrowed from the concept of visual perception and part-wise integration of visual information by the cortical regions of the brain. In this context, each numeral image is divided into four half-parts: top-half, bottom-half, left-half and right-half; the other half of the image being kept masked. An efficient data representation is derived in an unsupervised manner, from each image part, using convolutional auto-encoders (CAE), for our learning scheme that involves both early and late fusion of features. The chief advantage of the features derived from convolutional auto-encoders is the preservation of 2D spatial locality while the features are being filtered layer-by-layer through the convolutional architecture. The features derived from each individual CAE are fused by concatenation in our early fusion scheme, and learnt using an appropriate classifier. The late fusion strategy involves learning the probability density pertaining to the predicted values emanating from the four base classifiers using a meta-learner classifier. The early-cum-late fusion is proposed in the later stage of our work to combine the goodness of both schemes and enhance the performance. The support vector machine is used in all the classification stages. Experiments on the benchmark MNIST dataset of handwritten English numerals prove that our method competes favorably to the state of the art, as inferred from the high classification scores achieved. Our method thus provides a computationally simple and effective methodology for sub-part learning and part-wise integration of information from different parts of the image. The method also contributes to saving in computational expense since, at a time, only a small part of the image is processed, speeding up the inferencing process.
... The distance-or similarity-based classifiers are still popular in the field of Computer Vision despite of advances in deep learning [20,24]. Sometimes a non-linear transformation of the data space helps in constructing improved classifiers for real-world datasets [21,27,28]. ...
Chapter
Full-text available
Most of the real-world datasets suffer from the problem of imbalanced class representation, with some classes having more than sufficient samples while some other classes are heavily underrepresented. The imbalance in the class distributions of the majority and minority samples renders conventional classifiers like the k-Nearest Neighbor (kNN) classifier ineffective due to the heavy bias towards the majority class. In this paper, we propose data space transformation of the training set by distance metric learning prior to an enhanced classification phase. The classification phase comprises of partitioning the set of k-nearest neighbors of each transformed test sample into two clusters based on their distances from the two extreme members in the set. A majority voting of the training samples within the ‘elite’ cluster that is closest to the test sample indicates the label of the test sample. Our proposed method is called Data Space Transformation with Metric Learning and Elite k-Nearest Neighbor cluster formation (DST-ML-EkNN). Extensive experiments on benchmark datasets using a variety of Metric Learning methods, with comparisons to the state-of-the-art, establish the supremacy of our learning approach for imbalanced datasets. Link to code: https://github.com/AmiteshDTU/DST-ML-EkNN
... In the case of random gaze, time plays a crucial role in understanding the chronological order of the attention pattern of the individual [7]. The accurate localization of the eyes plays a crucial role in gaze fixation [8]. The absence of iris and retinal data, due to coarse resolution in most surveillance videos, complicates the gaze analysis problem. ...
Conference Paper
Full-text available
This paper presents a novel perspective on human attention span modeling based on gaze estimation from head pose data extracted from videos. This is achieved by devising specialized 2D visualization plots that capture gaze progression and gaze sustenance over time. In doing so, a low-resolution analysis is assumed, as is the case with most crowd surveillance videos wherein the retinal analysis and iris pattern extraction of individual subjects is made impossible. The information is useful for studies involving the random gaze behavior pattern of humans in a crowded place, or in a controlled environment in seminars or office meetings. The extraction of useful information regarding the attention span of the individual from the spatial and temporal analysis of gaze points is the subject of study in this paper. Different solutions ranging from plotting temporal gaze plots to sustained attention span graphs are investigated, and the results are compared with the existing techniques of attention span modeling and visualization.
... Since then, several researches have been directed towards developing more efficient machine learning algorithms for deciphering human emotions from facial expressions [17]. Automated facial expression analysis usually proceeds by segmenting the skin pixels, locating the eye, nose or mouth coordinates, and cropping the face followed by normalization to a standard size [18][19][20][21][22][23][24]. Automated face cropping tools such as the Viola Jones Face detector [25] are readily downloadable from the web. ...
Article
Full-text available
In this paper we investigate information-theoretic image coding techniques that assign longer codes to improbable, imprecise and non-distinct intensities in the image. The variable length coding techniques when applied to cropped facial images of subjects with different facial expressions, highlight the set of low probability intensities that characterize the facial expression such as the creases in the forehead, the widening of the eyes and the opening and closing of the mouth. A new coding scheme based on maximum entropy partitioning is proposed in our work, particularly to identify the improbable intensities related to different emotions. The improbable intensities when used as a mask decode the facial expression correctly, providing an effective platform for future emotion categorization experiments.
... Automated face recognition is usually a challenging task since it involves the pre-processing steps of skin segmentation, location of the eye, nose and mouth coordinates, and face cropping and normalizing to a standard size [1][2][3]. Automated face cropping tools such as the Viola Jones Face detector [4] are however now readily downloadable from the web. The pre-processing steps are followed by the extraction of suitable facial features from the cropped and normalized face image and subsequent classification into subject classes. ...
Conference Paper
Full-text available
We define a new set of spatio-temporal features called the 3D-Difference Theoretic Texture Features (3D-DTTF) for dynamic face recognition from videos. The Difference Theoretic Texture Features (DTTF) is a low-dimensional 2D scale-, rotation- and illumination-invariant texture descriptor set which reported high accuracies for texture recognition experiments in [6]. The 3D-DTTF extends the gray-level difference statistics along the Front (F), Front-Diagonal Horizontal (FDH) and Front-Diagonal Vertical (FDV) directions in addition to the existing horizontal, vertical and diagonal directions in the two-dimensional DTTF. The new 3D features are affine-invariant similar to their 2D counterpart, a property useful for recognizing faces in a video irrespective of the change in facial expressions. Experimentation on the Cohn-Kanade facial expression video dataset yields higher accuracy for the proposed dynamic face recognition as compared to other methods.
... Recently SIFT features were even used for stitching images into panoramas [9][10]. SIFT descriptors have been used for face recognition in [11][12] but it requires the knowledge of face geometry and facial landmarks such as eyes and mouth which can be either manually annotated or automated [13]. In [11] three methodologies for SIFT based face recognition are discussed: 1)the distance between all pairs of keypoints of the test and training images are computed and the minimum distance is used to classify the test to a class, 2) the use of features associated with eyes and mouth only and 3) the use of a regular grid for localizing features. ...
... It is the characteristic of the product function that any significant drop in any one of its inputs would pull the product low and we use this to detect a match through either of our two fuzzy measures in Eq. (8) and (11) We define an error function between the two fuzzy classifiers in Eq. (8) and (11) which together constitutes the product classifier in Eq. (13). The error is defined as the absolute difference between sum of fuzzy memberships and entropy weighted fuzzy memberships as shown below. ...
... Fig. 2(b) for the same subject but with different backgrounds is correctly classified by the entropy weighted classifier in (11) as predicted, while Fig. 2(g) with very strong facial expression match is correctly identified by the fuzzy classifier in (8) but not by the entropy weighted fuzzy classifier. In both these cases the product SIFT classifier in Eq. (13) gives the correct results since it combines the goodness of both Eq. (8) and (11) Tables 2 and 3 summarize the classification results for all the 100 test images for the six methods-three proposed fuzzy SIFT classifiers, David Lowe's method [1], Leonardo's SIFT classifier [23] and SPDA for single template face recognition [31]. ...
Article
Full-text available
A fuzzy match index for scale-invariant feature transform (SIFT) features is proposed in this study that cumulatively involves all the test SIFT keypoints in the decision-making process. The new fuzzy SIFT classifier is adapted successfully for robust face recognition from complex backgrounds without using any face cropping tools and using only a single training template. The further incorporation of entropy weights ensures that the facial features have a greater role in the soft decision-making as compared with the background features. The highlights of the authors' work are: (i) The development of a novel highly efficient fuzzy SIFT descriptor matching tool; (ii) incorporation of feature entropy weights to highlight the contribution of facial features; (iii) application to robust face recognition from uncropped images having diverse backgrounds with a single template for each subject. The authors thus allow for weak supervision of the face recognition experiment and obtain high accuracy for 20 subjects of the CALTECH-256 face database, 133 subjects of the labelled faces for the wild dataset and 994 subjects of the FERET database, with state-of-the-art comparisons indicating the supremacy of the authors' approach. Link to code: https://in.mathworks.com/matlabcentral/fileexchange/82933-fuzzy-sift-keypoint-matching
Conference Paper
Full-text available
Direct classification of normalized and flattened 3D facial landmarks reconstructed from 2D images is proposed in this paper for recognizing eight types of facial expressions depicting the emotions of- sadness, anger, contempt, disgust, fear, happiness, neutral and surprised. The first stage is the 3D projection of 2D facial landmarks. The pre-trained convolutional Face Alignment Network (FAN) proposed recently for 2D/3D face alignment is used for the purpose. The 3D cartesian coordinates are translated to the spherical coordinate system prior to the classification stage. A variety of classifiers are tried and tested for classifying the 68 facial landmarks, for both the coordinate systems. The benchmark CK+ video dataset is used for the experimentation; the last frame of each video that depicts the peak of each emotion is used as the input image. The experimental results indicate that direct classification of normalized and flattened 3D facial landmarks in the spherical coordinate system yields the highest accuracy for the support vector machine classifier with grid search for determining optimal parameters.
Chapter
Tele-medicine systems run the risk of unauthorized access to medical records, and there is greater possibility for the unlawful sharing of sensitive patient information, including children, and possibly showing their private parts. Aside from violating their right to privacy, such practices discourage patients from subjecting themselves to tele-medicine. The authors thus present an automatic identity concealment system for pictures, the way it is designed in the GetBetter tele-medicine system developed under a WHO/TDR grant. Based on open-source face- and eye-detection algorithms, identity concealment is executed by blurring the eye region of a detected face using pixel shuffling. This method is shown to be not only effective in concealing the identity of the patient, but also in preserving the exact distribution of pixel values in the image. This is useful when subsequent image processing techniques are employed, such as when identifying the type of lesions based on images of the skin.
Article
Tele-medicine systems run the risk of unauthorized access to medical records, and there is greater possibility for the unlawful sharing of sensitive patient information, including children, and possibly showing their private parts. Aside from violating their right to privacy, such practices discourage patients from subjecting themselves to tele-medicine. The authors thus present an automatic identity concealment system for pictures, the way it is designed in the GetBetter tele-medicine system developed under a WHO/TDR grant. Based on open-source face- and eye-detection algorithms, identity concealment is executed by blurring the eye region of a detected face using pixel shuffling. This method is shown to be not only effective in concealing the identity of the patient, but also in preserving the exact distribution of pixel values in the image. This is useful when subsequent image processing techniques are employed, such as when identifying the type of lesions based on images of the skin.