Figure 1 - uploaded by Esteban Feuerstein
Content may be subject to copyright.

Context in source publication

Context 1
... cache attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of in- tersecting the corresponding inverted lists. Figure 1 shows the architecture of a SE including the basic caching levels. ...

Citations

... Further studies regarding cost-aware intersection caching are presented in Feuerstein and Tolosa (2013) and Feuerstein and Tolosa (2014). Basically, these works focus on two different scenarios: with the inverted index residing on disk and in main memory. ...
Article
Full-text available
Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy.
... In previous work we showed that it is possible to obtain extra savings increasing the efficiency and effectiveness [5] of the intersection cache and proposed hybrid data structures [8]. In this work we go a step ahead in that direction introducing the usage of data mining techniques to decide which items should be cached and which not. ...
... Given a conjunctive query q = {t 1 , t 2 , t 3 , ..., t n } that represents a user's information need (each t i is a term, and the user wants the list of documents that contain all the terms) we adopt a query resolution strategy that first decomposes the query in pairs of terms. Each pair is checked in the Intersection Cache and the final resolution order is given by first considering the pairs that are present in the cache and afterwards intersecting them with the remaining ones [5]. The intersection cache is dynamically managed using the Greedy-Dual Size (GDS) strategy [4], and an interesting challenge is to define an access policy, i.e. to previously decide which pairs will be allowed into the cache and which ones will not. ...
... Our baseline is the basic caching algorithm without any cache access policy (NoAP) and the ideal bound is a variation of a clairvoyant algorithm that "knows" all the singleton pairs and avoids caching them (AP-Clair). We explore several cache sizes and run 5 million queries over a simulation framework [5]. Figure 1 shows the results we obtain. ...
... In the case of industry-scale search engines that store the entire index in main memory [5] the List Cache becomes useless, but the intersection cache is still useful [9] because it allows to save CPU time (i.e. the cost of intersecting two posting lists). For more general cases such as medium-scale systems, only a fraction of the index is maintained in cache. ...
... The first proposal on intersection caching appears in [13], where the authors introduce a three-level caching architecture for a web search engine. Further studies on cost aware intersection caching are presented in [8] and [9]. In a more recent work, Ozcan et al. [16] introduce a 5-level static caching architecture. ...
... More precisely, as our main contribution we propose a static cache (named Integrated Cache) that replaces both list and intersection caches using a data structure previously used for pairing terms in disk inverted indexes. This data structure already makes an efficient use of memory space, but we design a specific cache management strategy that avoids the duplication of cached terms and we adopt a query resolution strategy (named S4 in [9]) that tries to maximize the hit ratio. ...
Conference Paper
Full-text available
Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. In this study we propose and evaluate a static cache that works simultaneously as list and intersection cache, offering a more efficient way of handling cache space. In addition, we propose effective strategies to select the term pairs that should populate the cache. Simulation using two datasets and a real query log reveal that the proposed approach improves overall performance in terms of total processing time, achieving savings of up to 40% in the best case.
Conference Paper
Caching is an effective optimization in large scale web search engines, which is to reduce the underlying I/O burden of storage systems as far as possible by leveraging cache localities. Result cache and posting list cache are popular used approaches. However, they cannot perform well with long queries. The policies used in intersection cache are inefficient with poor flexibility for different applications. In this paper, we analyze the characteristics of query term intersections in typical search engines, and present a novel three-level cache architecture, called TLMCA, which combines the intersection cache, result cache, and posting list cache in memory. In TLMCA, we introduce an intersection cache data selection policy based on the Top-N frequent itemset mining, and design an intersection cache data replacement policy based on incremental frequent itemset mining. The experimental results demonstrate that the proposed intersection cache selection and replacement policies used in TLMCA can improve the retrieval performance by up to 27% compared to the two-level cache.