Figure 4 - uploaded by Shin-ya Sato
Content may be subject to copyright.
Comparison of Search Techniques 

Comparison of Search Techniques 

Source publication
Article
Full-text available
This paper presents Ingrid, an architecture for a fully distributed, fully self-configuring information navigation infrastructure that is designed to scale to global proportions. Unlike current designs, Ingrid is not a hierarchy of large index servers. Rather, links are automatically placed between individual resources based on their topic similari...

Contexts in source publication

Context 1
... and Harvest are just two of many examples of the latter category. As illustrated in Figure 4, the goal of Ingrid is to allow searching of the whole web, but necessarily with less depth than can be achieved with a single-database search engine. Thus, the functionality of Ingrid is complementary with that of limited-coverage single-search ...
Context 2
... examples of search engines that attempt to index all web resources are Lycos and WWWW. As shown in Figure 4, these whole-web search engines and Ingrid are attempting to do roughly the same job, and are therefore essentially competing technologies. Thus, we wish to briefly justify the work of Ingrid in light of whole-web search engines. The primary justification for work on Ingrid is scaling. It is not clear that the single-database approach will be able to keep pace with the growth of the web. So far, Lycos has apparently been able to keep pace, as it seems to consistently be indexing approximately 75% of the estimated 4 million (as of July 1995) total URLs. On one hand, 4 million documents barely scratches the surface of the total number of documents that can be expected to be available over the web in the future. On the other hand, Lycos has probably barely scratched the surface of what a "single" search-engine can do, given massive parallelism, huge memory farms, and the like. In short, the ability of a single-database search engine to be able the index the entire web, and the associated costs, are unknown. Likewise, the ability of Ingrid to search the entire web is also unknown. Thus, it seems prudent to experiment with both ...

Similar publications

Article
Full-text available
Unlabelled: Current web-based genome browsers require repetitious user input to scroll over long distances, alter the drawing density of elements or zoom through multiple orders of magnitude. Generally, either the server or the client is responsible for the majority of data processing, resulting in either servers having to receive and handle data...
Article
Full-text available
According to the characteristics of the management workflow in the department of experimental management of teaching administration, a solution of the campus experimental project management system based on Intranet technology has been designed. The solution, by adopting Internet technology and the system structure of Brower/Server to analysis and d...
Article
Full-text available
Face Recognition is a field research in Computer Vision that study about learning face and determine the identity of the face from a picture sent to the system. By utilizing this face recognition technology, learning process about people's identity between students in a university will become simpler. With this technology, student won't need to bro...

Citations

... The structure thus produced could be arranged hierarchically using a document clustering technique similar to Gloor's hypertext system [11], or could be arranged in a mesh of linked lists similar to the Ingrid system. [9] Each Referral server would be identified by a machine-readable abstract summarizing the scope of queries it could answer effectively. Part of this abstract might even be a piece of code that acts as a filter on incoming queries, choosing which should be directed to that Referral server. ...
Article
Full-text available
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996. Includes bibliographical references (leaves 166-169).
... A new and evolving topic is how to find elements in support of peer-to-peer communication. Ingrid [15] was an early attempt to address the problem. In Ingrid each entity is identified only by an unordered set of attribute value pairs. ...
Article
Naïve pictures of the Internet frequently portray a small collection of hosts or LAN's connected by a "cloud" of connectivity. The truth is more complex. The IP-level structure of the Internet is composed from a large number of constituent networks, each of which differs in some or all of transmission technologies, routing protocols, administrative models, security policies, QoS capabilities, pricing mechanisms, and similar attributes. On top of this, a whole new structure of application-layer overlays and content distribution networks, equally diverse in the sorts of ways mentioned above, is rapidly evolving. Virtually any horizontal slice through the current Internet structure reveals a loosely coupled federation of separately defined, operated, and managed entities, interconnected to varying degrees, and often differing drastically in internal requirements and implementation. Intuitively, it is natural to think of each of these entities as existing in a region of the network, with each region having coherent internal technology and policies, and each region managing its interactions with other regions of the net according to some defined set of rules and policies.In this paper, we propose that a key design element in an architecture for extremely large scale, wide distribution and heterogeneous networks is a grouping and partitioning mechanism we call the region. Furthermore we postulate that such a mechanism can provide increased functionality and management of existing unresolved problems in current networks. The paper both describes a proposed definition of the region concept and explores the utility of such a mechanism through a series of examples. We claim that there is significant added benefit to generalizing the idea of the region.
... The architecture of the search engine has been primarily that of a centralized nature. [BDH94], Ingrid [FKS95], the Context-Sensitive Infrastructure (CSI) [FEF01], [FES02], and a Content Discovery System (CDS) based on Rendezvous Points [GS02]. ...
... There have been many attempts to provide support for the information retrieval task. Techniques and approaches include, among others, query languages [11], routing protocols [12] , hypertext navigation systems [13], semantic networks [25], network management protocols [8], software-agents [4,9], brokers [4,7,9], metadata management [16,17,22], ®le search systems [10,20], ontologies [5,18,21], federated and heterogeneous databases [5,6,23], directory systems [26,28] and mediator technology [1,2,12,19,27]. The main network retrieval methodologies nowadays are the World Wide Web (HTTP) [3] with its various search engines, along with other subsystems such as WHOIS [26], Archie [10] and Ingrid [13]. ...
... Techniques and approaches include, among others, query languages [11], routing protocols [12] , hypertext navigation systems [13], semantic networks [25], network management protocols [8], software-agents [4,9], brokers [4,7,9], metadata management [16,17,22], ®le search systems [10,20], ontologies [5,18,21], federated and heterogeneous databases [5,6,23], directory systems [26,28] and mediator technology [1,2,12,19,27]. The main network retrieval methodologies nowadays are the World Wide Web (HTTP) [3] with its various search engines, along with other subsystems such as WHOIS [26], Archie [10] and Ingrid [13]. Regarding network intelligence used for retrieval, the most prominent system is Harvest [7], which is a distributed information retrieval system. ...
Article
This paper addresses the issue of locating relevant information in a network of heterogeneous, unfederated information bases of various types, including structured databases, text, audio, picture and video files. The problem is to determine where the required information resides in a network, in locations unknown to the user. The objective is to construct a user-friendly, intelligent, search and routing mechanism in order to find the most relevant information bases in the network. We introduce a mechanism for presenting queries, routing queries, updating knowledge, and learning in a metaknowledge base (MKB). This has been named the metaknowledge-based intelligent routing system (MIRS). MIRS finds the location of the desired information by its ability to “understand” the user’s query and to access information by content, rather than by address. MIRS behaves like a distributed search engine, working with a distributed metaknowledge index-file. There is no need for periodic web-crawling, web-robots, or agents of any sort. The network itself encapsulates the knowledge and routing algorithms that provide the user access-by-content to the relevant information. Contrary to web servers, the specific MIRS servers are not linked by hypertext links, but rather by knowledge links, randomly acquired or expertly built. The system also differs from the usual search engines in that it is capable of handling different types of media (e.g., text, database, multimedia) and applies natural language parsing techniques to understand the intention of the user, as well as potentially use a user-profile to enhance the original query before distributing it over the network. The “metadata” describing the information bases are spread across a network of routing and information servers and are modified as a result of search operations and introduction of new information bases into the system.
... In einigen Projekten wurden komplett dezentrale Architekturen zur Informationssuche entwickelt. Im System Ingrid [Francis et al., 1995] ...
Article
Full-text available
Das Forschungsgebiet der Informationssuche (Information Retrieval) wird durch das Aufkommen des Internets bzw. des World Wide Web mit völlig neuen Herausforderungen konfrontiert. Im Gegensatz zu herkömmlichen Datenbeständen zeichnet sich das Internet durch seinen immensen Umfang, eine hohe Dynamik, die Heterogenität seiner Inhalte sowie die Verteilung über Rechner auf der ganzen Erde aus. Um auch in diesem Informationsraum präzise, effizient und umfassend nach Informationen suchen zu können, wird in dieser Arbeit ein Konzept zur Spezialisierung von Suchmaschinen auf einzelne Themengebiete vorgeschlagen. Solche Suchmaschinen erkennen die für sie relevanten Dokumente anhand einer speziellen Filterfunktion und können ihren Benutzern eine themenspezifische Benutzungsschnittstelle und Suchfunktionalität bieten. Um die für eine spezialisierte Suchmaschine relevanten Dokumente effizient zu lokalisieren, wird die Technologie der mobilen Programme eingesetzt. Anstatt alle zu untersuchenden Dokumente zur Suchmaschine zu übertragen, werden mobile Filterprogramme zu den Datenbeständen gesandt, untersuchen diese 'vor Ort' und liefern lediglich die relevanten Dokumente zurück. Es werden Verfahren vorgestellt, mit denen die Aussendung der mobilen Programme so koordiniert werden kann, dass die resultierenden Kommunikationskosten minimiert werden. Da von diesen Aussendungsverfahren Kenntnisse über die netzwerktechnische Entfernung zwischen den beteiligten Rechnern benötigt werden, wird zudem ein Ansatz vorgestellt, der die Schätzung beliebiger Netzwerkdistanzen im Internet auf skalierbare und effiziente Weise ermöglicht. Die Tragfähigkeit der Konzepte zur Schätzung von Netzwerkdistanzen und zur Aussendung mobiler Programme wird anhand umfangreicher Messungen evaluiert. Zudem wird in einer Fallstudie der Nutzen der themenspezifischen Suchmaschinen sowie der in ihrem Kontext erfolgende Einsatz mobiler Filterprogramme analysiert.
... Some of these suggest reliance on some additional naming scheme, such as Uniform Resource Names (URNs) [Sollins and Masinter, 1994], handles [Kahn and Wilensky, 1995], Persistent Uniform Resource Locator (PURLs) [OCLC] or Common Names [CNRP]. Other approaches involve monitoring and notification to insure referential integrity (e.g., [Ingham et al., 1996]), [Mind-it], [Macskassy and Shklar, 1997], [Francis et al., 1995]). In this paper, we demonstrate a different approach to this problem. ...
Article
We propose robust hyperlinks as a solution to the problem of broken hyperlinks. A robust hyperlink is a URL augmented with a small "signature", computed from the referenced document. The signature can be submitted as a query to web search engines to locate the document. It turns out that very small signatures are sufficient to readily locate individual documents out of the many millions on the web. Robust hyperlinks exhibit a number of desirable qualities: They can be computed and exploited automatically, are small and cheap to compute (so that it is practical to make all hyperlinks robust), do not require new server or infrastructure support, can be rolled out reasonably well in the existing URL syntax, can be used to automatically retrofit existing links to make them robust, and are easy to understand. In particular, one can start using robust hyperlinks now, as servers and web pages are mostly compatible as is, while clients can increase their support in the future. Robust hyperlinks are one example of using the web to bootstrap new features onto itself.
... Such methods can be roughly classified as being based on database or knowledge representation techniques (e.g. [24, 6] and [3, 40, 29, 34] respectively). In the DB perspective the Web is regarded as a federation of databases, and query answering is based on the availability of ad-hoc wrappers and mediators for each specific information source. ...
Article
Full-text available
The need of friendly environments for effective information access is further enforced by the growth of the global Internet, which is causing a dramatic change in both the kind of people who access the information and the types of information itself (ranging from unstructured multimedia data to traditional record-oriented data). To cope with these new demands, the interaction techniques traditionally offered to the users have to evolve and eventually integrate in a powerful interface to the global information infrastructure. The new interaction mechanisms must be especially friendly and easy-to-use, since, given the enormous quantity of information sources available on the Internet, most of the users remain "permanent novices" with respect to each one of the sources they have access to. This tutorial offers a survey of the main approaches adopted for letting the users effectively interact with the Web. Thus, it covers topics related with both extracting the information of interest spre...
... Second, a need to perform directed searches. The formation of a search network based on interests was first proposed in [15]. ...
Article
This thesis proposes a reorganization algorithm, based on the region abstraction, to exploit the natural structure in overlays that stems from common interests. Nodes selfishly adapt their connectivity within the overlay in a distributed fashion such that the topology evolves to clusters of users with shared interests. Our architecture leverages the inherent heterogeneity of users and places within the system their incentives and ability to affect the network. As such, it is not dependent on the altruism of any other nodes in the system. Of particular interest is the optimality and fairness of our design. We rigorously define ideal and fair networks and develop a continuum of optimality measures by which to evaluate our algorithm. Further, to evaluate our algorithm within a realistic context, validate assumptions and make design decisions, we capture data from a portion of a live file-sharing network. More importantly, we discover, name, quantify and solve several previously unrecognized subtle problems in a content-based self-organizing network as a direct result of simulations using the trace data. We motivate our design by examining the dependence of existing systems on benevolent Super-Peers. Through simulation we find that the current architecture is highly dependent on the filtering capability and the willingness of the SuperPeer network to absorb the majority of the query burden. The remainder of the thesis is devoted to a world in which SuperPeers no longer exist or are untenable. In our evaluation, we introduce four reasons for utility suboptimal self-reorganizing networks: anarchy (selfish behavior), indifference, myopia and ordering. We simulate the level of utility and happiness achieved in existing architectures. Then we systematically tear (cont.) down implicit assumptions of altruism while showing the resulting negative impact on utility. From a selfish equilibrium, with much lower global utility, we show the ability of our algorithm to reorganize and restore the utility of individual nodes, and the system as a whole, to similar levels as realized in the SuperPeer network. Simulation of our algorithm shows that it reaches the predicted optimal utility while providing fairness not realized in other systems. Further analysis includes an epsilon equilibrium model where we attempt to more accurately represent the actual reward function of nodes. We find that by employing such a model, over 60% of the nodes are connected. In addition, this model converges to a utility 34% greater than achieved in the SuperPeer network while making no assumptions on the benevolence of nodes or centralized organization. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004. Includes bibliographical references (p. 92-95).
Article
The Digital Revolution has been spreading with the popularization of the Internet and is likely to change the social structure. Electronic Commerce (EC) is a virtual environment supporting social activities from information navigation/advertisement, payment and settlement, and distribution via open networks and digital information. This paper reviews the shift in emphasis from "atoms" (physical objects) to "bits" (digital information) and the current problems in EC services. Then we survey the recent trend of technologies supporting EC: information retrieval and advertisement, payment and settlement, and information distribution.
Article
In recognition of the need to provide better access to Web resources, a number of prototypes, products and services have emerged that provide some form of automated categorisation of Net resources. Several representative current efforts that apply established as well as more innovative methods of automated classification, organisation or other method of categorisation are profiled.