Example of detected face and corresponding rectified image

Source publication

A novel approach to personal photo album representation and management

Conference Paper

Full-text available

Jan 2008

In this paper we present a novel approach to personal photo album management allowing the end user to efficiently access the collection without any need for tedious manual annotation or indexing of the photos. The proposed work exploits methods and technology from the field of computer vision and pattern recognition for face detection, face represe...

Context 1

... at least three couples of points the six values a, b, c, d, e, f can be calcuated by least square. Faces where the feature detector failed to work with an high degree of confidence were rejected. Unfortunately face detection as well as facial features detection are error prone then in many cases it is not possible to obtain meaningful faces from generic images. Even worse in some cases the SVMs estimate with high confidence wrong facial features leading to non significant face data. Fig. 2 shows an example of automatically detected face and the corresponding rectified image. In fig. 3 a few rectified faces are reported. Note that, even though faces are heavily distorted, the identity of depicted people is still evident and faces are reasonably aligned to allow for appearance-based similarity search. 14 Once a face has been detected and succesfully rectified and cropped, a 20-dimensional face descriptor is computed. The descriptor is a vector w containing the projection of the rectified and cropped face in a subspace of the global face space. In practice the average face Ψ is subtracted from the 100 × 80 cropped and rectified face Γ i and the obtained image Φ is then projected on the eigenspace to obtain w i = e T i Φ. The average face Ψ and the 16 eigenimages e i associated with the largest eigenvalues are shown in fig. 4. The face space, as well as the average face, is learned off-line on a significant subset of the image collection and it is not updated. At any time, if most of the faces present in the image collection differ significantly from the training set, it is possible to build a new face space and effortlessy recompute the projection of each detected, rectified and cropped face in the new face space. In our experience we learned the face space with about 500 images. Other information, such as the size and the position of the originally detected face in the image or the reliability of the rectification process, 3 are also stored for future use but are not exploited in the current version of the system. The largest part of semantic information in personal photo is conveyed in areas where faces appear, the remaining part of the image is the context of the scene. Each picture is processed with the face detector 13 selecting areas containing faces. These areas are approximated with bounding boxes and are dealt as seen in previous section. The remaining part of the image is then processed as background (see fig. 1). According the same approach, more complex or additional detectors can be added to the system to extract further objects of interest in the scene and operate a different figure/background segmentation (e.g. a detector for entire body could be easily integrated in the system). Background is processed to capture relevant patterns able to characterize, with a coarse classification, the context of the scene depicted in the image. The representation of background is dealt with visual symbols or token usually referred as visual terms . These terms, introduced by Duygulu et al., 15 are used to represent visual content in a similar way to what is done in documents versus words representation. The association of labels to background patterns, expressed as function of visual terms, is performed with a supervised approach using Maximal Figure of Merit(MFoM) classifier. 16 It is a classifier based on Linear Discriminant Function (LDF) that is trained to learn a chosen figure of merit (e.g. Precision, Recall, F1 measure,...) and has been employed in automatic image annotation in 17 . 4 A visual feature describes image content with a sequence of values that can be interpreted as the projection of the image in the feature space. The distribution of the feature values in feature space tends to have multimodal density in the vector space. The centroids corresponding to different modes are considered as forming a base for data representation and are called visual terms ; any image can be represented as function of these points. Visual terms can be used to map single feature values, using in this case a representation simply based on unigrams, or they can be used considering structured forms such as spatial bigrams or even more complex structures. The data-driven approach for the extraction of visual terms allows the visual terms to emerge from the data set and build generic sets of symbols with representation power that is limited only by the coverage of the training set. Although k -means is typically used for extraction of visual terms, 15, 18 in this work the extraction of the visual terms has been achieved applying the vector quantization to the entire set of the characteristic vectors. In particular the codebooks are produced by the LBG algorithm of Linde et al. 19 ensuring less computational cost and a limited quantization error. A single feature allows capturing particular information of the image dataset according to its characteristics. Feature statistics in the image are dependent from the feature itself and are function of its statistical occurrence in the image. For example, if A = { A 1 , A 2 , . . . , A M } is the set of M visual terms for the feature A, each image is represented by a vector V = ( v 1 , v 2 , . . . , v M ) where the i -th component takes into account the statistic of the term A i in the image. Furthermore, the representation of the visual content can be enriched exploiting the spatial information. In fig. 5 is shown the usage of bigrams for an image partitioned with a regular grid. Each element is represented with a visual term identified as X . Using bigrams leads to better results than using unigrams although at the cost of higher dimensionality for image representation. As example for bigram-based representation, considering a codebook for a single feature formed by M elements, the image representation can be built placing in a vector the unigram-based representation followed by the bigrams-based representation. The total dimension of the vector in this case will be, M ∗ M + M . For a codebook of 64 elements the total dimension of the representation is 4160, for 128 elements it is 16512 and so on. Obviously the complexity of the visual information is captured more reliably if more characteristics, as orthogonal as possible, are used together. A simple way to represent visual patterns is to extract features related to color information and to texture, create the composed features as juxtaposition of the values of the two features and extract a visual vocabulary from the entire set of features. In this case visual terms will take into account the composition of color and texture features for the described ...

View in full-text

Control What You Include! Server-Side Protection against Third Party Web Tracking

Article

Full-text available

Mar 2017

Third party tracking is the practice by which third parties recognize users accross different websites as they browse the web. Recent studies show that 90% of websites contain third party content that is tracking its users across the web. Website developers often need to include third party content in order to provide basic functionality. However,...

Towards Semantic Web-Based Information Retrieval to solve Information Overload in an Applied Gaming Ecosystem

Article

Full-text available

Mar 2019

The RAGE research project will provide access to a wide range of software assets for Applied Gaming (AG) enabling AG software developers to better collaborate, share their knowledge, and be able to react to new requirements and trends more efficiently. But, RAGE is facing a greater challenge which is Information Overload (IO) because of the permane...

Ethique et connaissance client dans le domaine de l'e-mailing publicitaire

Article

Full-text available

Jan 2011

Ethics and Customer Knowledge in The Field Of The E-Advertising In the e-advertising sector the customer knowledge proves a major competitive stake. However this customer knowledge depends on multiple factors among which the control by the very Internet user of his own digital identity. To address strategies of unveiling of one's self on the net, t...

Sistemele de asistare a deciziilor în contextul societăţii cunoaşterii

Book

Full-text available

Jul 2009

Daniel Homocianu

In this work based on the doctoral thesis entitled "Decision Support Systems in the Knowledge Society" we treat theoretical and practical aspects related to decision-making in the economic environment and beyond. In the first part, it was made an introduction to the thesis research, the research being imposed even by title and by assuming a series...

Transforming Wikipedia into a Search Engine for Local Experts

Article

Full-text available

Nov 2015

Finding experts for a given problem is recognized as a difficult task. Even when a taxonomy of subject expertise exists, and is associated with a group of experts, it can be hard to exploit by users who have not internalized the taxonomy. Here we present a method for both attaching experts to a domain ontology, and hiding this fact from the end use...

Classification of Indoor Actions through Deep Neural Networks

Chapter

Nov 2016

The raising number of elderly people urges the research of systems able to monitor and support people inside their domestic environment. An automatic system capturing data about the position of a person in the house, through accelerometers and RGBd cameras can monitor the person activities and produce outputs associating the movements to a given tasks or predicting the set of activities that will be executes. We considered, for the task the classification of the activities a Deep Convolutional Neural Network. We compared two different deep network and analyzed their outputs.

An On-line Learning Method for Face Association in Personal Photo Collection

Article

May 2012
IMAGE VISION COMPUT

Due to the widespread use of cameras, it is very common to collect thousands of personal photos. A proper organization is needed to make the collection usable and to enable an easy photo retrieval. In this paper, we present a method to organize personal photo collections based on “who” is in the picture. Our method consists in detecting the faces in the photo sequence and arranging them in groups corresponding to the probable identities. This problem can be conveniently modeled as a multi-target visual tracking where a set of on-line trained classifiers is used to represent the identity models. In contrast to other works where clustering methods are used, our method relies on a probabilistic framework; it does not require any prior information about the number of different identities in the photo album. To enable future comparison, we present experimental results on a public dataset and on a photo collection generated from a public face dataset.

Entropy-Based Localization of Textured Regions

Conference Paper

Full-text available

Sep 2011

Appearance description is a relevant field in computer vision that enables object recognition in domains as re-identification, retrieval and classification. Important cues to describe appearance are colors and textures. However, in real cases, texture detection is challenging due to occlusions and to deformations of the clothing while person’s pose changes. Moreover, in some cases, the processed images have a low resolution and methods at the state of the art for texture analysis are not appropriate. In this paper, we deal with the problem of localizing real textures for clothing description purposes, such as stripes and/or complex patterns. Our method uses the entropy of primitive distribution to measure if a texture is present in a region and applies a quad-tree method for texture segmentation. We performed experiments on a publicly available dataset and compared to a method at the state of the art[16]. Our experiments showed our method has satisfactory performance.

Three-domain Image Representation for Personal Photo Album Management

Conference Paper

Full-text available

Feb 2010
Proceedings of SPIE

In this paper we present a novel approach for personal photo album management. Pictures are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected and rectified using a probabilistic feature extraction technique. Face representation is then produced by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable image file format) data. Each image in the collection is then automatically organized using a mean-shift clustering technique. While many systems manage faces and typically allow queries about them we use a common approach to manage multiple aspects, that is, queries regarding people, time and background are dealt with in a homogenous way. We report experimental results on a realistic set, i.e., a personal photo album, of about 2000 images where automatic detection and rectification of faces lead to approximately 800 faces. Significance of clustering has been evaluated and results are very interesting.

Mobile Interface for Content-Based Image Management

Conference Paper

Full-text available

Feb 2010

People make more and more use of digital image acquisition devices to capture screenshots of their everyday life. The growing number of personal pictures raise the problem of their classification. Some of the authors proposed an automatic technique for personal photo album management dealing with multiple aspects (i. e., people, time and background) in a homogenous way. In this paper we discuss a solution that allows mobile users to remotely access such technique by means of their mobile phones, almost from everywhere, in a pervasive fashion. This allows users to classify pictures they store on their devices. The whole solution is presented, with particular regard to the user interface implemented on the mobile phone, along with some experimental results.

Automatic Image Representation and Clustering on Mobile Devices.

Article

Full-text available

Jan 2010

In this paper a novel approach for the automatic representation of pictures on mobile devices is proposed. With the wide diffusion of mobile digital image acquisition devices, the need for managing a large number of digital images is quickly increasing. In fact, the storage capacity of such devices allow users to store hundreds or even thousands, of pictures that, without a proper organization, become useless. Users may be interested in using (i.e., browsing, saving, printing and so on) a subset of stored data according to some particular picture properties. A content-based description of each picture is needed to perform on-board image indexing. In our work, the images are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected, and a face representation is produced by projecting the face itself in a common low dimensional eigenspace. Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable Image File Format) data. Faces, background and time information of each image in the collection is automatically organized using a mean-shift clustering technique. Significance of clustering has been evaluated on a realistic set of about 1000 images and results are promising.

Clustering techniques for personal photo album management

Article

Full-text available

Oct 2009
J ELECTRON IMAGING

We propose a novel approach for the automatic representation of pictures achieving a more effective organization of personal photo albums. Images are analyzed and described in multiple representation spaces, namely, faces, background, and time of capture. Faces are automatically detected, rectified, and represented, projecting the face itself in a common low-dimensional eigenspace. Backgrounds are represented with low-level visual features based on an RGB histogram and Gabor filter bank. Faces, time, and background information of each image in the collection is automatically organized using a mean-shift clustering technique. Given the particular domain of personal photo libraries, where most of the pictures contain faces of a relatively small number of different individuals, clusters tend to be semantically significant besides containing visually similar data. We report experimental results based on a data set of about 1000 images where automatic detection and rectification of faces lead to approximately 400 faces. Significance of clustering has been evaluated, and results are very encouraging.

Mean shift clustering for personal photo album organization

Conference Paper

Nov 2008
Image Process

In this paper we propose a probabilistic approach for the automatic organization of pictures in personal photo album. Images are analyzed in term of faces and low-level visual features of the background. The description of the background is based on RGB color histogram and on Gabor filter energy accounting for texture information. The face descriptor is obtained by projection of detected and rectified faces on a common low dimensional eigenspace. Vectors representing faces and background are clustered in an unsupervised fashion exploiting a mean shift clustering technique. We observed that, given the peculiarity of the domain of personal photo libraries where most of the pictures contain faces of a relatively small number of different individuals, clusters tend to be not only visually but also semantically significant. Experimental results are reported.

Detection of Indoor Actions Through Probabilistic Induction Model

Conference Paper

May 2018

In the present work a system able to classify the indoor action is presented. The data are recorded with multiple kind of sensor collecting the position of the joints of the person in the room, the acceleration recorded on the person wrist and the presence or absence in a specific room. The latent semantic analysis, based on the principal component search, allows to estimate the probability of a given action according the sampled values.

Identity Recognition through Human Gaze Tracking

Conference Paper

Dec 2013

The paper describes a system for the human machine interaction that is able to identify users according how she looks at the monitor while using a given interface. The system does not need invasive measurements that could limit the naturalness of her actions and detects the eyes movement from the estimation provided by a kinect camera. The proposed approach clusters the sequences of user gaze on the screen characterizing the user identity according the particular pattern his/her gaze follows. The possibility of identify people through gaze movement introduces a new perspective on human-machine interaction. For example, a user can obtain different contents according his recorded preferences and a software can modify its interface to meet the preferences of a given user.

Example of detected face and corresponding rectified image

Context in source publication

Similar publications

Citations