Michael Persin's research while affiliated with RMIT University and other places

Publications (3)

Article
Ranking techniques are effective at finding answers in document collections but can be expensive to evaluate. We propose an evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs; for our test data, queries are evaluated in 2% of the memory of the standard implementation without degradatio...
Conference Paper
Ranking techniques are effective for finding answers in document collections but the cost of evaluation of ranked queries can be unacceptably high. We pr( ,pose an evaluation technique that reduces both main memory usage and query evaluation time. ba~ed on early recognition of which documents are likely to be highly ranked. Our experiments show tha...
Conference Paper
For large document databases, evaluation of ranked queries can be expensive in cpu time, memory usage, and disk traffic. It has been shown that memory usage can be dramatically reduced by use of a simple filtering heuristic that eliminates most documents from consideration. In this paper we show that, by designing inverted indexes explicitly to sup...

Citations

... Another problem studied is the optimization of query execution in order to find to k top-ranked objects without the need to access all the ranks from the information sources [FAGI96]. A similar approach based on heuristic thresholds on rank values is also reported in [PERS94] in the context of processing inverted lists in an IRS. A rule based approach to query optimization in mediators is presented in [HKWY97]. ...
... Our work is also related to work in information retrieval (IR) optimization [5] [14] [15] [16] [17] [22] [23]. In IR, each document can be represented by a sparse vector of weighted terms, indexed through inverted lists. ...
... In this scenario, we select all DocID smaller than maximum current DocID (candidate document = DocID \ if DocID ≤ MaxCurDocID). The set of documents selected is {It 1 (1, 2); It 2 (2); It 3 (3,5); It 4 (4); It 5 (6)} = {1; 2; 3; 4; 5; 6}.This list is sorted in ascending order and the k'' first documents are inserted in top-k heap and their exact scores can be calculated. Knowing that k''=k-k'. ...