Table 2 - uploaded by Steffen Bickel
Content may be subject to copyright.
Examples of the taxonomy mapping after applying the semi-automatic assignment.

Examples of the taxonomy mapping after applying the semi-automatic assignment.

Source publication
Article
Full-text available
The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the architecture of a classification system that uses a web...

Contexts in source publication

Context 1
... directory node is assigned up to three categories of the subject taxonomy, resulting in an n : m mapping between the taxonomies. Some examples of the resulting mapping table are shown in Table 2. The web directory organizes country specific pages in the same structure as the main taxonomy. ...
Context 2
... mapping recommendation system posts these queries to the web di- rectory search engine; the categories of retrieved web pages are added to a candidate list of recommended mappings. After manually inspecting these recommendations for the KDD Cup task and accepting some 200 of them, we ob- tain a mapping that entails 763 rules following the schema visualized in Table 2. ...

Citations

... These signs are looked for after so Google can keep on improving the nature of search query results. Illyes additionally referenced in the digital recording scene that a greater amount of Google's signs may move toward becoming machine learning-based [17]. ...
Article
Artificial intelligence (AI) and machine learning are at the moment measured to be the unique widespread inventions. Artificial Intelligence rummage-sale to stand an unbelievable conception from science fiction, but nowadays it’s flattering a day-to-day authenticity. On the other hand, a neural network emulates the procedure of actual neurons in the brain that are parquet the track near innovations in machine learning, baptised deep learning. Machine learning can cosiness us living cheerier, improved, and additional dynamic conscious, if the power of the Deep learning concepts and its proper utilization as an industrial revolution that harness mental and cognitive ability. Currently lots of research papers deal with the Artificial Intelligence of deep learning in various real time applications that includes intelligent gaming, smart driving, and environmental protection and so on. Irrespective of all applications an intelligent decision making must be done timely to improve the accuracy in one end and simultaneously on the other end to consume energy and system efficiency. This paper presents the various applications using deep learning efficiently by better decision making and also how to visualize the problems in order to take a conclusion for better solution. The analysis of such real time problems is done by logically in the form of using artificial neurons through supervised and unsupervised data.
... In its first stage, we use ODP to map query to an intermediate category, and in the second stage, it adopts exact matching and SVM classifier to obtain the target query intents. A variation of the method has won the query intent classification task of KDDCUP [37]. ...
... The performance of chat bots and search engines strongly depends on their ability to capture user intent. Vogel et al. (2005) addressed the issue of mapping a search engine query to certain nodes of a subject taxonomy that expresses a possible query. An architecture of a user intent classification system uses a web directory to determine the query context by the query term frequencies. ...
Article
To support a natural flow of a conversation between humans and automated agents, rhetoric structures of each message has to be analyzed. We classify a pair of paragraphs of text as appropriate for one to follow another, or inappropriate, based on both topic and communicative discourse considerations. To represent a multi-sentence message with respect to how it should follow a previous message in a conversation or dialogue, we build an extension of a discourse tree for it. Extended discourse tree is based on a discourse tree for RST relations with labels for communicative actions, and also additional arcs for anaphora and ontology-based relations for entities. We refer to such trees as Communicative Discourse Trees (CDTs). We explore syntactic and discourse features that are indicative of correct vs incorrect request-response or question-answer pairs. Two learning frameworks are used to recognize such correct pairs: deterministic, nearest-neighbor learning of CDTs as graphs, and a tree kernel learning of CDTs, where a feature space of all CDT sub-trees is subject to SVM learning. We form the positive training set from the correct pairs obtained from Yahoo Answers, social network, corporate conversations including Enron emails, customer complaints and interviews by journalists. The corresponding negative training set is artificially created by attaching responses for different, inappropriate requests that include relevant keywords. The evaluation showed that it is possible to recognize valid pairs in 70% of cases in the domains of weak request-response agreement and 80% of cases in the domains of strong agreement, which is essential to support automated conversations. These accuracies are comparable with the benchmark task of classification of discourse trees themselves as valid or invalid, and also with classification of multi-sentence answers in factoid question-answering systems. The applicability of proposed machinery to the problem of chatbots, social chats and programming via NL is demonstrated. We conclude that learning rhetoric structures in the form of CDTs is the key source of data to support answering complex questions, chatbots and dialogue management.
... Finally, their solution retrieves the top five categories returned by the neural network or the mapping of categories obtained with the two search engines. Other researchers that participate in the KDD contest published their solution [137]. They propose a similar approach that send queries to a web directory in order to obtain a list of categories. ...
Thesis
During the last few years, the technological progress in collecting, storing and processing a large quantity of data for a reasonable cost has raised serious privacy issues. Privacy concerns many areas, but is especially important in frequently used services like search engines (e.g., Google, Bing, Yahoo!). These services allow users to retrieve relevant content on the Internet by exploiting their personal data. In this context, developing solutions to enable users to use these services in a privacy-preserving way is becoming increasingly important. In this thesis, we introduce SimAttack an attack against existing protection mechanism to query search engines in a privacy-preserving way. This attack aims at retrieving the original user query. We show with this attack that three representative state-of-the-art solutions do not protect the user privacy in a satisfactory manner. We therefore develop PEAS a new protection mechanism that better protects the user privacy. This solution leverages two types of protection: hiding the user identity (with a succession of two nodes) and masking users' queries (by combining them with several fake queries). To generate realistic fake queries, PEAS exploits previous queries sent by the users in the system. Finally, we present mechanisms to identify sensitive queries. Our goal is to adapt existing protection mechanisms to protect sensitive queries only, and thus save user resources (e.g., CPU, RAM). We design two modules to identify sensitive queries. By deploying these modules on real protection mechanisms, we establish empirically that they dramatically improve the performance of the protection mechanisms.
... David Vogel et al. [2005] demonstrates that most web search queries contain only two or three terms and therefore provide very limited information about the user's information need to the search engine. Similar work has been carried out by Milos and Mirjana in CatS [2006]. ...
... Amrish Singh et al. (2005) proposed an approach to presenting web search results that supports personalization, taking into consideration users' perspectives. David Vogel et al. (2005) demonstrates that most web search queries contain only two or three terms and therefore provide very limited information about the user's information need to the search engine. Similar work has been carried out by Milos and Mirjana in CatS (2006). ...
... The task in KDDCUP 2005 is to automatically classify 800,000 queries to 67 predetermined categories with only 111 manually labelled queries. Most solutions submitted by 32 teams (Kardkovács, Tikk, & Bánsághi, 2005;Vogel et al., 2005) gathered extra information to augment query terms. Although the participants were able to achieve encouraging results, with the median F 1 score at 0.23 (max. ...
... It is basically a post-retrieval algorithm which uses document classification techniques to organize search results into a meaningful hierarchy of topics, based on the perspective of the user performing the search, represented as a taxonomic ontology. David Vogel et al. [8] demonstrates that most web search queries contain only two or three terms and therefore provide very limited information about the user's information need to the search engine. Utilizing this information is a key factor to constructing effective web search engines. ...
Article
Full-text available
The World Wide Web has immense resources for all kind of people for their specific needs. Searching on the Web using search engines such as Google, Bing, Ask have become an extremely common way of locating information. Searches are factorized by using either term or keyword sequentially or through short sentences. The challenge for the user is to come up with a set of search terms/keywords/sentence which is neither too large (making the search too specific and resulting in many false negatives) nor too small (making the search too general and resulting in many false positives) to get the desired result. No matter, how the user specifies the search query, the results retrieved, organized and presented by the search engines are in terms of millions of linked pages of which many of them might not be useful to the user fully. In fact, the end user never knows that which pages are exactly matching the query and which are not, till one check the pages individually. This task is quite tedious and a kind of drudgery. This is because of lack of refinement and any meaningful classification of search result. Providing the accurate and precise result to the end users has become Holy Grail for the search engines like Google, Bing, Ask etc. There are number of implementations arrived on web in order to provide better result to the users in the form of DuckDuckGo, Yippy, Dogpile etc. This research proposes development of a meta-search engine, called WebSEReleC (Web-based SEReleC) that provides an interface for refining and classifying the search engines' results so as to narrow down the search results in a sequentially linked manner resulting in drastic reduction of number of pages using power of Google.
... It is also an efficient research tool because it allows the users to record the search results and create reports of their research easily and automatically. Other related works [7] [8] [9] [10] [11] [12] [13] have also been carried out but all of them do not address all the problems discussed in section 2. ...
Article
Full-text available
The World Wide Web has immense resources for all kind of people for their specific needs. Searching on the Web using search engines such as Google, Bing, Ask have become an extremely common way of locating information. Searches are factorized by using either term or keyword sequentially or through short sentences. The challenge for the user is to come up with a set of search terms/keywords/sentence which is neither too large (making the search too specific and resulting in many false negatives) nor too small (making the search too general and resulting in many false positives) to get the desired result. No matter, how the user specifies the search query, the results retrieved, organized and presented by the search engines are in terms of millions of linked pages of which many of them might not be useful to the user fully. In fact, the end user never knows which pages are exactly matching the query and which the pages are not, till one checks it individually by referring that page. Providing the accurate and precise result to the end users has become Holy Grail for the search engines like Google, Bing, Ask etc. This research proposes a meta-search engine called EGG that is intended to use power of the Google for more accurate and combinatorial search. This is achieved through simple manipulation and automation of Google functions that are accessible from EGG through the Google.
... However some methods have been successful for query topic classification, e.g. utilising additional unlabeled data (Taksa et al., 2007;Beitzel, Jensen, Lewis, Chowdhury and Frieder, 2007) and bridging topic hierarchies to enable training on larger datasets (Li et al., 2005;Vogel et al., 2005;Shen et al., 2006a). As a result query topic classification can be useful in many tasks, including: ...
Article
Purpose This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation. Design/methodology/approach The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retrieval systems are quite different. Findings It is found that search engines tend to perform similarly on queries about the same topic; and search engine performance is sensitive to the topic distribution of queries used in evaluation. Originality/value Using experiments with multiple real‐world query logs, the paper demonstrates weaknesses in the current evaluation model of retrieval systems.