Figure 5 - uploaded by Mohamed Reda Bouadjenek
Content may be subject to copyright.
2-Example of a folksonomy with eight users who annotate one resource using seven tags. The triples (u, t, r) are represented as ternary-edges connecting a user, a resource and a tag.

2-Example of a folksonomy with eight users who annotate one resource using seven tags. The triples (u, t, r) are represented as ternary-edges connecting a user, a resource and a tag.

Source publication
Thesis
Full-text available
Nowadays, the Web has evolved from a static Web where users were only able to consume information, to a Web where users are also able to produce information. This evolution is commonly known as Social Web or Web 2.0. Social platforms and networks are certainly the most adopted technologies in this new era. These platforms are commonly used as a mea...

Contexts in source publication

Context 1
... a document, each user has his own understanding of its content. There- fore, each user employs a different vocabulary and words to describe, comment, and annotate this document (see Figure 5.1). For example, if we look at the homepage of Youtube, a given user can tag it using "video", "Web" and "music" while another can tags it using "news", "movie", and "media". ...
Context 2
... approach we are proposing relies on users annotations as source of social information, which are associated to documents in bookmarking systems. As illus- trated in Figure 5.1, the textual content of a document is shared between users under a common representation, i.e. all terms in a document are identically shared and presented to users as in a classic IR model, while the annotations given by a user to this document express his personal understanding of its content. Thus, these annota- tions symbolize a personal representation of this document to this user, e.g. the red annotations given by Bob to the document express his personal representation of this document, while green annotations constitute the personal representation of this doc- ument to Alice since she used them to describe its content. ...
Context 3
... consider the web page YouTube.com as a document that matches this query. This web page is associated to many bookmarks in a folksonomy as illustrated in Figure 5.2. There are eight users (Alice, Bob, Carol, Eve, Mallory, Nestor, Oscar, and Trudy) who annotated YouTube.com using seven tags (info, web, video, news, blog, social, and mine). ...
Context 4
... information is reused to create its PSDR according to the query issuer. These phases are the following, as illustrated in Figure 5.3: ...
Context 5
... thinks that info is associated to youtube.com with a weight of 0.5. This phase includes four steps enumerated from 1 to 4 in Figure 5.3. ...
Context 6
... a matrix factorization process is used to infer the PSDR of the considered doc- ument to the query issuer based on identifying weighting patterns. This phase corresponds to step 5 in Figure 5.3. ...
Context 7
... Finally, ranking documents based on their PDSR and their textual content. This phase is illustrated in steps 6 and 7 in Figure 5.3. ...
Context 8
... objective in this first step is to gather as much useful information as possible around the user and the relatives who may serve to construct and enrich the PSDR. As illustrated in Figure 5.3, each web page can be represented using an m × n Users-Tags matrix M d U,T of m users who annotate the web page and the n tags that they used to annotate it. Each entry w ij in the matrix represents the number of times the user u i used the term t j to annotate the considered web page. ...
Context 9
... 5.2. In the folksonomy of Figure 5.2, Bob used the term video to annotate the web page Youtube.com once. ...
Context 10
... of using all users' feedback to infer a PSDR of the considered web page to Bob, we propose to choose only the most representative ones in order to filter out irrelevant users who may represent noise. To do so, we use a ranking function to rank users from the most relevant to the less relevant ones, and select only the top k users as the most representative to both the query issuer and the considered web page (see Step 2 of Figure 5.3). The irrelevant users may: ...
Context 11
... we select only the terms that the top k users employed to annotate this web page and build a new reduced Users-Tags matrix, which is expected to be more represen- tative to both the query issuer and the considered web page (see Step 3 in Figure 5.3). ...
Context 12
... we select their tags to built a new (smaller) Users-Tags matrix M d U,T . Finally, we add the query issuer as a new entry in the Users-Tags matrix M d U,T as well as his tags, if any (see step 3 of Figure 5.3). Once the matrix built, we proceed to the computation of the weights associated to each cell as detailed in the following. ...
Context 13
... 5.5. For factorizing the Users-Tags matrix of Figure 5.3 using two dimen- sions, we have the two multi-variables matrices: ...
Context 14
... main computation effort for generating a PSDR of a document is in building the Users-Tags matrix and factorize it (Steps 1 to 5 in Figure 5.3). The time complexity needed for building a Users-Tags matrix is ...
Context 15
... which corresponds to rank users for selecting the most representative (step 2 in Figure 5.3). For factorizing the matrix, the main computation of the gradient descent algorithm is evaluating the objective function L in Equation 5.6 and its derivatives in Equations 5.7 and 5.8 (see Algorithm 5.1). ...
Context 16
... an illustration, Figure 5.4 shows the execution time needed for processing queries according to the number of documents that they match w.r.t. ...
Context 17
... queries and the users were randomly selected 10 times inde- pendently, and we report the average results each time. As depicted in Figure 5.4, none of these parameters have an impact on the execution time. ...
Context 18
... parameter is illustrated in Figure 5.5. The obtained results show that optimal results are obtained while selecting 1 or 2 related users depending on the ranking function and the retrieval process used. ...
Context 19
... Mean Reciprocal Rank. Figure 5.5 -Impact of the number of users. ...
Context 20
... results of this parameter are illustrated in Figure 5.6. This parameter controls the weight of the social regularization terms of the objective function given in Equa- tion 5.6. ...
Context 21
... results of this parameter are illustrated in Figure 5.7. The optimal value is ob- tained for γ ∈ [0.6, 0.9], a value which we consider as a tradeoff between the person- alized and the non-personalized parts. ...
Context 22
... results of this parameter are illustrated in Figure 5.8. This parameter allows to control the social and the document proximity parts while computing the ranking scores for users in Equation 5.2. ...
Context 23
... results of this parameter are illustrated in Figure 5.9. As one can see, the cosine similarity measure provides better retrieval performance by allowing to be more effi- cient in discriminating between users. ...
Context 24
... results of the comparison are illustrated in Figure 5.10, while varying γ. ...
Context 25
... illustrated in Figure 5.10, the obtained results show that our approach is much more efficient than all the non-personalized approaches for all values of γ. Hence, we conclude that the personalization efforts introduced by our approach in the represen- tation of documents with respect to each user bring a considerable improvement of the search quality. ...
Context 26
... experimental results are shown in Figure 5.11 over the 10 classes of queries. ...
Context 27
... process is performed by the evaluator without knowing which algorithm generated the result. Figure 5.12 shows the interface that the users obtained when they participated to the survey. ...
Context 28
... P@7 calculation, we considered any positive judgment as relevant. The obtained results are shown in Figure 5.13 as measured by NDCG@7 and P@7. ...
Context 29
... process is performed by the evaluator without knowing which algorithm generated the result. Figure 5.12 shows the interface that the users obtained when they participated to the survey. ...
Context 30
... a document, each user has his own understanding of its content. There- fore, each user employs a different vocabulary and words to describe, comment, and annotate this document (see Figure 5.1). For example, if we look at the homepage of Youtube, a given user can tag it using "video", "Web" and "music" while another can tags it using "news", "movie", and "media". ...
Context 31
... approach we are proposing relies on users annotations as source of social information, which are associated to documents in bookmarking systems. As illus- trated in Figure 5.1, the textual content of a document is shared between users under a common representation, i.e. all terms in a document are identically shared and presented to users as in a classic IR model, while the annotations given by a user to this document express his personal understanding of its content. Thus, these annota- tions symbolize a personal representation of this document to this user, e.g. the red annotations given by Bob to the document express his personal representation of this document, while green annotations constitute the personal representation of this doc- ument to Alice since she used them to describe its content. ...
Context 32
... consider the web page YouTube.com as a document that matches this query. This web page is associated to many bookmarks in a folksonomy as illustrated in Figure 5.2. There are eight users (Alice, Bob, Carol, Eve, Mallory, Nestor, Oscar, and Trudy) who annotated YouTube.com using seven tags (info, web, video, news, blog, social, and mine). ...
Context 33
... information is reused to create its PSDR according to the query issuer. These phases are the following, as illustrated in Figure 5.3: ...
Context 34
... thinks that info is associated to youtube.com with a weight of 0.5. This phase includes four steps enumerated from 1 to 4 in Figure 5.3. ...
Context 35
... a matrix factorization process is used to infer the PSDR of the considered doc- ument to the query issuer based on identifying weighting patterns. This phase corresponds to step 5 in Figure 5.3. ...
Context 36
... Finally, ranking documents based on their PDSR and their textual content. This phase is illustrated in steps 6 and 7 in Figure 5.3. ...
Context 37
... objective in this first step is to gather as much useful information as possible around the user and the relatives who may serve to construct and enrich the PSDR. As illustrated in Figure 5.3, each web page can be represented using an m × n Users-Tags matrix M d U,T of m users who annotate the web page and the n tags that they used to annotate it. Each entry w ij in the matrix represents the number of times the user u i used the term t j to annotate the considered web page. ...
Context 38
... 5.2. In the folksonomy of Figure 5.2, Bob used the term video to annotate the web page Youtube.com once. ...
Context 39
... of using all users' feedback to infer a PSDR of the considered web page to Bob, we propose to choose only the most representative ones in order to filter out irrelevant users who may represent noise. To do so, we use a ranking function to rank users from the most relevant to the less relevant ones, and select only the top k users as the most representative to both the query issuer and the considered web page (see Step 2 of Figure 5.3). The irrelevant users may: ...
Context 40
... we select only the terms that the top k users employed to annotate this web page and build a new reduced Users-Tags matrix, which is expected to be more represen- tative to both the query issuer and the considered web page (see Step 3 in Figure 5.3). ...
Context 41
... we select their tags to built a new (smaller) Users-Tags matrix M d U,T . Finally, we add the query issuer as a new entry in the Users-Tags matrix M d U,T as well as his tags, if any (see step 3 of Figure 5.3). Once the matrix built, we proceed to the computation of the weights associated to each cell as detailed in the following. ...
Context 42
... 5.5. For factorizing the Users-Tags matrix of Figure 5.3 using two dimen- sions, we have the two multi-variables matrices: ...
Context 43
... main computation effort for generating a PSDR of a document is in building the Users-Tags matrix and factorize it (Steps 1 to 5 in Figure 5.3). The time complexity needed for building a Users-Tags matrix is ...
Context 44
... which corresponds to rank users for selecting the most representative (step 2 in Figure 5.3). For factorizing the matrix, the main computation of the gradient descent algorithm is evaluating the objective function L in Equation 5.6 and its derivatives in Equations 5.7 and 5.8 (see Algorithm 5.1). ...
Context 45
... an illustration, Figure 5.4 shows the execution time needed for processing queries according to the number of documents that they match w.r.t. ...
Context 46
... queries and the users were randomly selected 10 times inde- pendently, and we report the average results each time. As depicted in Figure 5.4, none of these parameters have an impact on the execution time. ...
Context 47
... parameter is illustrated in Figure 5.5. The obtained results show that optimal results are obtained while selecting 1 or 2 related users depending on the ranking function and the retrieval process used. ...
Context 48
... Mean Reciprocal Rank. Figure 5.5 -Impact of the number of users. ...
Context 49
... results of this parameter are illustrated in Figure 5.6. This parameter controls the weight of the social regularization terms of the objective function given in Equa- tion 5.6. ...
Context 50
... results of this parameter are illustrated in Figure 5.7. The optimal value is ob- tained for γ ∈ [0.6, 0.9], a value which we consider as a tradeoff between the person- alized and the non-personalized parts. ...
Context 51
... results of this parameter are illustrated in Figure 5.8. This parameter allows to control the social and the document proximity parts while computing the ranking scores for users in Equation 5.2. ...
Context 52
... results of this parameter are illustrated in Figure 5.9. As one can see, the cosine similarity measure provides better retrieval performance by allowing to be more effi- cient in discriminating between users. ...
Context 53
... results of the comparison are illustrated in Figure 5.10, while varying γ. ...
Context 54
... illustrated in Figure 5.10, the obtained results show that our approach is much more efficient than all the non-personalized approaches for all values of γ. Hence, we conclude that the personalization efforts introduced by our approach in the represen- tation of documents with respect to each user bring a considerable improvement of the search quality. ...
Context 55
... experimental results are shown in Figure 5.11 over the 10 classes of queries. ...
Context 56
... process is performed by the evaluator without knowing which algorithm generated the result. Figure 5.12 shows the interface that the users obtained when they participated to the survey. ...
Context 57
... P@7 calculation, we considered any positive judgment as relevant. The obtained results are shown in Figure 5.13 as measured by NDCG@7 and P@7. ...
Context 58
... process is performed by the evaluator without knowing which algorithm generated the result. Figure 5.12 shows the interface that the users obtained when they participated to the survey. ...

Citations

... M. Bouadjenek, (2016) [20] proposed a framework for enhancing the information retrieval. The proposed framework exploits annotation as a part of resource analysis in addition to the resource's content. ...
... This in addition to consider the users' annotations and participation as part of web resources analysis in the SRR, M. Bouadjenek. (2016) [20], and D. Yong. (2011) [8]. ...
... La première consiste à enrichir le contenu des documents en indexant à la fois avec les contenus textuels et les contenus sociaux affiliés tels que les annotations et les commentaires [13,16,14,17,7]. Et la seconde réside dans l'instauration des indexes personnalisés pour représenter, commenter et tager les documents selon le vocabulaire propre à chaque internaute [18,19]. L'exploitation du contexte social pour l'amélioration de la recherche d'information est l'objectif des travaux de Karweg et al. [34] qui a présenté une approche permettant de mesurer le taux de pertinence sociale en se basant soit sur l'évaluation du degré d'interaction de l'internaute avec une ressource web (cliques, avis, commentaire, …), ou sur la base du taux de crédibilité de chaque internaute mesuré par le biais de son graphe de réseau social en utilisant l'algorithme de PageRank qui permet de calculer quantitativement le taux de popularité. ...
... Chaque utilisateur emploie son propre vocabulaire pour décrire, commenter et annoter ce document. Par conséquent, la solution est de créer des indexes personnalisés [208, 31]. ...
Thesis
Notre travail se situe dans le contexte de recherche d’information sociale (RIS) et s’intéresse plus particulièrement à l’exploitation du contenu généré par les utilisateurs dans le processus de la recherche d’information. Le contenu généré par les utilisateurs (en anglais User-generated content, ou UGC) se réfère à un ensemble de données (ex. signaux sociaux) dont le contenu est principalement, soit produit, soit directement influencé par les utilisateurs finaux. Il est opposé au contenu traditionnel produit, vendu ou diffusé par les professionnels. Le terme devient populaire depuis l’année 2005, dans les milieux du Web 2.0, ainsi que dans les nouveaux médias sociaux. Ce mouvement reflète la démocratisation des moyens de production et d’interaction dans le Web grâce aux nouvelles technologies. Parmi ces moyens de plus en plus accessibles à un large public, on peut citer les réseaux sociaux, les blogs, les microblogs, les Wikis, etc. Les systèmes de recherche d’information exploitent dans leur majorité deux classes de sources d’évidence pour trier les documents répondant à une requête. La première, la plus exploitée, est dépendante de la requête, elle concerne toutes les caractéristiques relatives à la distribution des termes de la requête dans le document et dans la collection (tf-idf). La seconde classe concerne des facteurs indépendants de la requête, elle mesure une sorte de qualité ou d’importance a priori du document. Parmi ces facteurs, on en distingue le PageRank, la localité thématique du document, la présence d’URL dans le document, ses auteurs, etc. Une des sources importantes que l’on peut également exploiter pour mesurer l’intérêt d’une page Web ou de manière générale une ressource, est le Web social. En effet, grâce aux outils proposés par le Web 2.0 les utilisateurs interagissent de plus en plus entre eux et/ou avec les ressources. Ces interactions (signaux sociaux), traduites par des annotations, des commentaires ou des votes associés aux ressources, peuvent être considérés comme une information additionnelle qui peut jouer un rôle pour mesurer une importance a priori de la ressource en termes de popularité et de réputation, indépendamment de la requête. Nous supposons également que l’impact d’un signal social dépend aussi du temps, c’est-à-dire la date à laquelle l’action de l’utilisateur est réalisée. Nous considérons que les signaux récents devraient avoir un impact supérieur vis-à-vis des signaux anciens dans le calcul de l’importance d’une ressource. La récence des signaux peut indiquer certains intérêts récents à la ressource. Ensuite, nous considérons que le nombre de signaux d’une ressource doit être pris en compte au regard de l’âge (date de publication) de cette ressource. En général, une ressource ancienne en termes de durée d’existence a de fortes chances d’avoir beaucoup plus de signaux qu’une ressource récente. Ceci conduit donc à pénaliser les ressources récentes vis-à-vis de celles qui sont anciennes. Enfin, nous proposons également de prendre en compte la diversité des signaux sociaux au sein d’une ressource. Mots clés : Recherche d’information, Réseaux sociaux, Contenu généré par l’utilisateur, Signaux sociaux, Propriétés sociales, Temps, Diversité.
Article
Full-text available
In recent decades, researchers have realized that social networks are important sources for adhering to the evolution of many aspects of Information Retrieval (IR). These social networks have produced vast amounts of important information that are not covered by traditional IR systems. This improvement, which has become one of the main applications of IR, offers several social features such as the conversational exchange and the share of opinions by users, and the association of users with the same interests. This work introduces a model of Social Information Research that takes into account social information on users and exploits them as a second source of information for a given query. In the proposed IR system, the quality of results improved by the analysis of user’s needs and by comparing other users’ social data.