An example HIN and its implied schema.

Source publication

Figure 3: MNC vs the average-case density estimator í µí°¸íµí°¸í µí±í...

Figure 4: Example workload and Overlap Tree.

Figure 5: Overlap Tree with cache entry dependencies.

Figure 7: Evaluation against single-query methods.

ATRAPOS: Evaluating Metapath Query Workloads in Real Time

Preprint

Full-text available

Jan 2022

Heterogeneous information networks (HINs) represent different types of entities and relationships between them. Exploring, analysing, and extracting knowledge from such networks relies on metapath queries that identify pairs of entities connected by relationships of diverse semantics. While the real-time evaluation of metapath query workloads on la...

Context 1

... example HIN capturing scholarly data, along with its implied schema 2 that represents the involved entity types, relationship types, and properties, is illustrated in Figure 1. It consists of nodes representing papers (P), authors (A), venues (V), and topics (T) and (bidirectional) edges of three types: authors -papers (AP / PA), papers -topics (PT / TP), and papers -venues (PV / VP). ...

View in full-text

Context 2

... should be highlighted that this convention is followed here solely for the sake of simplicity; all approaches can easily accommodate edges of different types among the same pairs of nodes (e.g., since they employ separate adjacency matrices for distinct edge types). Figure 1 shows an example HIN comprising 14 nodes of 4 types (í µí°´, í µí±, í µí± and í µí± ) and 3 edge types (í µí°´í µí±/í µí±í µí°´, í µí±í µí± /í µí± í µí±, and í µí±í µí± /í µí± í µí±). A simplified metapath of length 3 is í µí± = ⟨í µí°´í µí±í µí± ⟩, and an instance of í µí± is the path ⟨'í µí± .í ...

View in full-text

Context 3

... the metapath ⟨í µí°´í µí±í µí± í µí±í µí°´⟩ of the HIN in Figure 1, which connects two authors that have published a paper on the same topic. A data scientist working with this HIN may need to examine only recent papers, e.g., papers published after 2000, may use the constrained metapath í µí± ′ = (⟨í µí°´í µí±í µí± í µí±í µí°´⟩, {í µí± .í ...

View in full-text

Context 4

... session restart probability í µí±. Figure 10 shows the improvement in execution time over HRank-S for CBS1, CBS2 and Atrapos as the session restart probability í µí± falls. As í µí± grows, the performance of all cache-based methods degrades, as there are fewer queries per session on average, hence fewer overlaps to exploit. ...

View in full-text

Context 5

... query workloads. Figure 11 shows execution times per query as we vary the distribution from which we select metapaths and constraints. Apart from the uniform distribution, used in other experiments, we consider Zipfian distributions varying the scaling parameter í µí»¼. ...

View in full-text

Context 6

... we investigate the performance of individual queries, averaging the reported execution times by query position across 10 workflows. Figure 12 shows the cumulative time during workload execution; cache-based approaches are noticeably faster than HRank-S, especially for the Scholarly HIN. Atrapos is also considerably faster than CBS1 and CBS2 with the difference increasing with the number of queries. ...

View in full-text

Context 7

... Atrapos is marginally faster than the PGDS policy on News articles HIN; the most notable difference appears with cache size equal to 8GB; at the same time, both cache policies incorporating frequency and item size outperform LRU. Figure 15 presents our results for all policies as we vary the dataset size. We observe a linear increase in execution time per query for all approaches as dataset size grows. ...

View in full-text

Context 8

... session restart probability í µí±. Figure 16 presents performance when varying the session restart probability í µí±. The speedup falls for all methods as í µí± grows, which is reasonable, as a larger í µí± results in sessions containing fewer queries and thus fewer overlaps to exploit. ...

View in full-text

Context 9

... of them outperform LRU, with significant differences in both datasets. Figure 17 illustrates the performance of the examined cache replacement policies while using a Zipfian distribution for query workload generation. We observe a notable performance improvement with all approaches when generating the query workload by a Zipfian distribution compared to that when using a uniform distribution. ...

View in full-text

Context 10

... both of these approaches outperform the LRU on both datasets. the Scholarly HIN (Figure 19a) in which LRU achieves considerably higher execution times that the other two approaches. Last but not least, Figures 19c reconfirms that PGDS and Atrapos are faster than the LRU as each quartile in their box plots starts lower than the respective quartile for LRU for the Scholarly HIN. ...

View in full-text

An example HIN and its implied schema.

Contexts in source publication