Architecture of a Search Engine

Source publication

Cost-aware Intersection Caching and Processing Strategies for In-memory Inverted Indexes

Conference Paper

Full-text available

Feb 2014

Context 1

... cache attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of in- tersecting the corresponding inverted lists. Figure 1 shows the architecture of a SE including the basic caching levels. ...

View in full-text

Performance Improvements for Search Systems using an Integrated Cache of Lists+Intersections

Article

Full-text available

Mar 2017
INFORM RETRIEVAL

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy.

Using Big Data Analysis to Improve Cache Performance in Search Engines

Conference Paper

Full-text available

Aug 2015

Performance Improvements for Search Systems Using an Integrated Cache of Lists+Intersections

Conference Paper

Full-text available

Oct 2014

Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space. In addition, we propose effective strategies to select the term pairs that should populate the cache. Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.

An Intersection Cache Based on Frequent Itemset Mining in Large Scale Search Engines

Conference Paper

Nov 2015

Caching is an effective optimization in large scale web search engines, which is to reduce the underlying I/O burden of storage systems as far as possible by leveraging cache localities. Result cache and posting list cache are popular used approaches. However, they cannot perform well with long queries. The policies used in intersection cache are inefficient with poor flexibility for different applications. In this paper, we analyze the characteristics of query term intersections in typical search engines, and present a novel three-level cache architecture, called TLMCA, which combines the intersection cache, result cache, and posting list cache in memory. In TLMCA, we introduce an intersection cache data selection policy based on the Top-N frequent itemset mining, and design an intersection cache data replacement policy based on incremental frequent itemset mining. The experimental results demonstrate that the proposed intersection cache selection and replacement policies used in TLMCA can improve the retrieval performance by up to 27% compared to the two-level cache.

Architecture of a Search Engine

Context in source publication

Citations