Figure 2 - uploaded by Filippo Vella
Content may be subject to copyright.
Example of detected face and corresponding rectified image 

Example of detected face and corresponding rectified image 

Source publication
Conference Paper
Full-text available
In this paper we present a novel approach to personal photo album management allowing the end user to efficiently access the collection without any need for tedious manual annotation or indexing of the photos. The proposed work exploits methods and technology from the field of computer vision and pattern recognition for face detection, face represe...

Context in source publication

Context 1
... at least three couples of points the six values a, b, c, d, e, f can be calcuated by least square. Faces where the feature detector failed to work with an high degree of confidence were rejected. Unfortunately face detection as well as facial features detection are error prone then in many cases it is not possible to obtain meaningful faces from generic images. Even worse in some cases the SVMs estimate with high confidence wrong facial features leading to non significant face data. Fig. 2 shows an example of automatically detected face and the corresponding rectified image. In fig. 3 a few rectified faces are reported. Note that, even though faces are heavily distorted, the identity of depicted people is still evident and faces are reasonably aligned to allow for appearance-based similarity search. 14 Once a face has been detected and succesfully rectified and cropped, a 20-dimensional face descriptor is computed. The descriptor is a vector w containing the projection of the rectified and cropped face in a subspace of the global face space. In practice the average face Ψ is subtracted from the 100 × 80 cropped and rectified face Γ i and the obtained image Φ is then projected on the eigenspace to obtain w i = e T i Φ. The average face Ψ and the 16 eigenimages e i associated with the largest eigenvalues are shown in fig. 4. The face space, as well as the average face, is learned off-line on a significant subset of the image collection and it is not updated. At any time, if most of the faces present in the image collection differ significantly from the training set, it is possible to build a new face space and effortlessy recompute the projection of each detected, rectified and cropped face in the new face space. In our experience we learned the face space with about 500 images. Other information, such as the size and the position of the originally detected face in the image or the reliability of the rectification process, 3 are also stored for future use but are not exploited in the current version of the system. The largest part of semantic information in personal photo is conveyed in areas where faces appear, the remaining part of the image is the context of the scene. Each picture is processed with the face detector 13 selecting areas containing faces. These areas are approximated with bounding boxes and are dealt as seen in previous section. The remaining part of the image is then processed as background (see fig. 1). According the same approach, more complex or additional detectors can be added to the system to extract further objects of interest in the scene and operate a different figure/background segmentation (e.g. a detector for entire body could be easily integrated in the system). Background is processed to capture relevant patterns able to characterize, with a coarse classification, the context of the scene depicted in the image. The representation of background is dealt with visual symbols or token usually referred as visual terms . These terms, introduced by Duygulu et al., 15 are used to represent visual content in a similar way to what is done in documents versus words representation. The association of labels to background patterns, expressed as function of visual terms, is performed with a supervised approach using Maximal Figure of Merit(MFoM) classifier. 16 It is a classifier based on Linear Discriminant Function (LDF) that is trained to learn a chosen figure of merit (e.g. Precision, Recall, F1 measure,...) and has been employed in automatic image annotation in 17 . 4 A visual feature describes image content with a sequence of values that can be interpreted as the projection of the image in the feature space. The distribution of the feature values in feature space tends to have multimodal density in the vector space. The centroids corresponding to different modes are considered as forming a base for data representation and are called visual terms ; any image can be represented as function of these points. Visual terms can be used to map single feature values, using in this case a representation simply based on unigrams, or they can be used considering structured forms such as spatial bigrams or even more complex structures. The data-driven approach for the extraction of visual terms allows the visual terms to emerge from the data set and build generic sets of symbols with representation power that is limited only by the coverage of the training set. Although k -means is typically used for extraction of visual terms, 15, 18 in this work the extraction of the visual terms has been achieved applying the vector quantization to the entire set of the characteristic vectors. In particular the codebooks are produced by the LBG algorithm of Linde et al. 19 ensuring less computational cost and a limited quantization error. A single feature allows capturing particular information of the image dataset according to its characteristics. Feature statistics in the image are dependent from the feature itself and are function of its statistical occurrence in the image. For example, if A = { A 1 , A 2 , . . . , A M } is the set of M visual terms for the feature A, each image is represented by a vector V = ( v 1 , v 2 , . . . , v M ) where the i -th component takes into account the statistic of the term A i in the image. Furthermore, the representation of the visual content can be enriched exploiting the spatial information. In fig. 5 is shown the usage of bigrams for an image partitioned with a regular grid. Each element is represented with a visual term identified as X . Using bigrams leads to better results than using unigrams although at the cost of higher dimensionality for image representation. As example for bigram-based representation, considering a codebook for a single feature formed by M elements, the image representation can be built placing in a vector the unigram-based representation followed by the bigrams-based representation. The total dimension of the vector in this case will be, M ∗ M + M . For a codebook of 64 elements the total dimension of the representation is 4160, for 128 elements it is 16512 and so on. Obviously the complexity of the visual information is captured more reliably if more characteristics, as orthogonal as possible, are used together. A simple way to represent visual patterns is to extract features related to color information and to texture, create the composed features as juxtaposition of the values of the two features and extract a visual vocabulary from the entire set of features. In this case visual terms will take into account the composition of color and texture features for the described ...

Similar publications

Article
Full-text available
Third party tracking is the practice by which third parties recognize users accross different websites as they browse the web. Recent studies show that 90% of websites contain third party content that is tracking its users across the web. Website developers often need to include third party content in order to provide basic functionality. However,...
Article
Full-text available
The RAGE research project will provide access to a wide range of software assets for Applied Gaming (AG) enabling AG software developers to better collaborate, share their knowledge, and be able to react to new requirements and trends more efficiently. But, RAGE is facing a greater challenge which is Information Overload (IO) because of the permane...
Article
Full-text available
Ethics and Customer Knowledge in The Field Of The E-Advertising In the e-advertising sector the customer knowledge proves a major competitive stake. However this customer knowledge depends on multiple factors among which the control by the very Internet user of his own digital identity. To address strategies of unveiling of one's self on the net, t...
Book
Full-text available
In this work based on the doctoral thesis entitled "Decision Support Systems in the Knowledge Society" we treat theoretical and practical aspects related to decision-making in the economic environment and beyond. In the first part, it was made an introduction to the thesis research, the research being imposed even by title and by assuming a series...
Article
Full-text available
Finding experts for a given problem is recognized as a difficult task. Even when a taxonomy of subject expertise exists, and is associated with a group of experts, it can be hard to exploit by users who have not internalized the taxonomy. Here we present a method for both attaching experts to a domain ontology, and hiding this fact from the end use...

Citations

... Even from the pattern a user adopts when seeing a content on a screen his own activities can be detected and classified. In [7] the regularities and the characteristics of user's gaze activities are clustered by means of the mean shift algorithm optimizing an entropy based figure of merit, similarly to what has been done in [8] and [9]. ...
Chapter
The raising number of elderly people urges the research of systems able to monitor and support people inside their domestic environment. An automatic system capturing data about the position of a person in the house, through accelerometers and RGBd cameras can monitor the person activities and produce outputs associating the movements to a given tasks or predicting the set of activities that will be executes. We considered, for the task the classification of the activities a Deep Convolutional Neural Network. We compared two different deep network and analyzed their outputs.
... Users can tag faces associating a label to the whole face group. A similar approach is followed also in3456. However, these methods apply clustering techniques to get a coarse face partition later refined by applying post-processing steps. ...
... On the one hand, faces in the " wild " processing are not a solved problem and new face descriptors and/or learning algorithms are needed to enhance identity recognition; on the other hand, new methods aiming to minimize user's interactions are required, moving from semi-automatic to fully automatic photo organization. Many previously proposed papers345 use clustering methods to group faces, each cluster representing an identity. All these methods do not explicitly consider the mutual exclusivity constraint. ...
Article
Due to the widespread use of cameras, it is very common to collect thousands of personal photos. A proper organization is needed to make the collection usable and to enable an easy photo retrieval. In this paper, we present a method to organize personal photo collections based on “who” is in the picture. Our method consists in detecting the faces in the photo sequence and arranging them in groups corresponding to the probable identities. This problem can be conveniently modeled as a multi-target visual tracking where a set of on-line trained classifiers is used to represent the identity models. In contrast to other works where clustering methods are used, our method relies on a probabilistic framework; it does not require any prior information about the number of different identities in the photo album. To enable future comparison, we present experimental results on a public dataset and on a photo collection generated from a public face dataset.
... A particular case is, for example, photo collection organization. In such application, Content Based Image Retrieval (CBIR) techniques and face features can be integrated in a probabilistic framework to define clusters of photos in order to ease browsing the collection[1]. In some works[5,9,19], once the face is detected, the region under the face is used to compute information about the clothing of the person. ...
... apply Canny detector to compute edges in the image discard edges in flat area by applying an adaptive threshold on local standard deviations set orientations to 0 for all pixels in the image for each pixel on an edge do compute orientation, assign a value in[1,5]depending on the estimated direction end for for each pixel (x, y) do compute orientation histogram in Neighborhood N = {(x−s, y −s); (x+s, y +s))} end for return as Primitives the orientation histograms ...
Conference Paper
Full-text available
Appearance description is a relevant field in computer vision that enables object recognition in domains as re-identification, retrieval and classification. Important cues to describe appearance are colors and textures. However, in real cases, texture detection is challenging due to occlusions and to deformations of the clothing while person’s pose changes. Moreover, in some cases, the processed images have a low resolution and methods at the state of the art for texture analysis are not appropriate. In this paper, we deal with the problem of localizing real textures for clothing description purposes, such as stripes and/or complex patterns. Our method uses the entropy of primitive distribution to measure if a texture is present in a region and applies a quad-tree method for texture segmentation. We performed experiments on a publicly available dataset and compared to a method at the state of the art[16]. Our experiments showed our method has satisfactory performance.
... Our point is that personal photo libraries show peculiar characteristics compared to general image collection, namely the presence of people in most of the images and a relatively small number of different individuals across the whole library that allow to achieve reliable results with automatic approaches. 2 In particular, in personal photo collection the user is mainly interested in who is in the picture (usually a relatively small number of different individuals) and where and when the picture was shot. Who, where and when are the fundamental aspects of photo information and input images can be intrinsecally split in three domains of interest. ...
Conference Paper
Full-text available
In this paper we present a novel approach for personal photo album management. Pictures are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected and rectified using a probabilistic feature extraction technique. Face representation is then produced by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable image file format) data. Each image in the collection is then automatically organized using a mean-shift clustering technique. While many systems manage faces and typically allow queries about them we use a common approach to manage multiple aspects, that is, queries regarding people, time and background are dealt with in a homogenous way. We report experimental results on a realistic set, i.e., a personal photo album, of about 2000 images where automatic detection and rectification of faces lead to approximately 800 faces. Significance of clustering has been evaluated and results are very interesting.
... Our point is that personal libraries stored on mobile devices show peculiar characteristics compared to general image collection. Namely the presence of people in most of the images and a relatively small number of different individuals across the whole library that allow to achieve reliable results with automatic approaches [1] [2]. ...
... The user then choose the local images to be classified and send them to the remote server. The remote server runs the algorithm on the images and sends back the classification terns as in (1). The MIDlet then move the pictures to the corresponding folders and add them some suitable metadata, if possible (depending on the image file type). ...
Conference Paper
Full-text available
People make more and more use of digital image acquisition devices to capture screenshots of their everyday life. The growing number of personal pictures raise the problem of their classification. Some of the authors proposed an automatic technique for personal photo album management dealing with multiple aspects (i. e., people, time and background) in a homogenous way. In this paper we discuss a solution that allows mobile users to remotely access such technique by means of their mobile phones, almost from everywhere, in a pervasive fashion. This allows users to classify pictures they store on their devices. The whole solution is presented, with particular regard to the user interface implemented on the mobile phone, along with some experimental results.
... The key point is the representation of each image with multiple descriptors in a suitable form for clustering. An image can be represented in several spaces allowing to capture different aspects of input data [22]. In the proposed system, each image in the collection is represented with features related to the presence of faces in the image, features characterizing background and time information. ...
Article
Full-text available
In this paper a novel approach for the automatic representation of pictures on mobile devices is proposed. With the wide diffusion of mobile digital image acquisition devices, the need for managing a large number of digital images is quickly increasing. In fact, the storage capacity of such devices allow users to store hundreds or even thousands, of pictures that, without a proper organization, become useless. Users may be interested in using (i.e., browsing, saving, printing and so on) a subset of stored data according to some particular picture properties. A content-based description of each picture is needed to perform on-board image indexing. In our work, the images are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected, and a face representation is produced by projecting the face itself in a common low dimensional eigenspace. Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable Image File Format) data. Faces, background and time information of each image in the collection is automatically organized using a mean-shift clustering technique. Significance of clustering has been evaluated on a realistic set of about 1000 images and results are promising.
... An image can be represented in several spaces allowing to capture different aspects of input data. 2 In the following sections the processing of visual information in the three chosen representation spaces is described. Faces are preprocessed to reduce the variation in appearance and are mapped in an auto emerging space employing eigenfaces. ...
Article
Full-text available
We propose a novel approach for the automatic representation of pictures achieving a more effective organization of personal photo albums. Images are analyzed and described in multiple representation spaces, namely, faces, background, and time of capture. Faces are automatically detected, rectified, and represented, projecting the face itself in a common low-dimensional eigenspace. Backgrounds are represented with low-level visual features based on an RGB histogram and Gabor filter bank. Faces, time, and background information of each image in the collection is automatically organized using a mean-shift clustering technique. Given the particular domain of personal photo libraries, where most of the pictures contain faces of a relatively small number of different individuals, clusters tend to be semantically significant besides containing visually similar data. We report experimental results based on a data set of about 1000 images where automatic detection and rectification of faces lead to approximately 400 faces. Significance of clustering has been evaluated, and results are very encouraging.
... In the proposed approach, each image in the collection is represented by the presence of faces and by visual background features [13]. A data oriented clustering allows to generate aggregation structures driven by the regularities in the represented data. ...
Conference Paper
In this paper we propose a probabilistic approach for the automatic organization of pictures in personal photo album. Images are analyzed in term of faces and low-level visual features of the background. The description of the background is based on RGB color histogram and on Gabor filter energy accounting for texture information. The face descriptor is obtained by projection of detected and rectified faces on a common low dimensional eigenspace. Vectors representing faces and background are clustered in an unsupervised fashion exploiting a mean shift clustering technique. We observed that, given the peculiarity of the domain of personal photo libraries where most of the pictures contain faces of a relatively small number of different individuals, clusters tend to be not only visually but also semantically significant. Experimental results are reported.
Conference Paper
In the present work a system able to classify the indoor action is presented. The data are recorded with multiple kind of sensor collecting the position of the joints of the person in the room, the acceleration recorded on the person wrist and the presence or absence in a specific room. The latent semantic analysis, based on the principal component search, allows to estimate the probability of a given action according the sampled values.
Conference Paper
The paper describes a system for the human machine interaction that is able to identify users according how she looks at the monitor while using a given interface. The system does not need invasive measurements that could limit the naturalness of her actions and detects the eyes movement from the estimation provided by a kinect camera. The proposed approach clusters the sequences of user gaze on the screen characterizing the user identity according the particular pattern his/her gaze follows. The possibility of identify people through gaze movement introduces a new perspective on human-machine interaction. For example, a user can obtain different contents according his recorded preferences and a software can modify its interface to meet the preferences of a given user.