Article

An impact ordering approach for indexing fuzzy set

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose an approach for indexing fuzzy data based on inverted files that speeds up retrieval considerably by stopping the traversal of postings lists early. This is possible because the entries in the postings lists are organized in a way that guarantees that there are no matching items beyond a certain point in a list. Consequently, we can reduce the number of false positives significantly, leading to an increase in retrieval performance. We have implemented our approach and evaluated it experimentally, including a test on skewed and real-world data, comparing it to an approach that has previously been shown to be superior to other methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Uncertainty extensively exists in data and knowledge intensive applications, in which fuzzy information processing plays a crucial role. Fuzzy sets have been extensively used to enhance various database models for managing fuzzy data or flexibly querying crisp data. This has resulted in numerous contributions in this research area. This paper pays attention to three crucial issues in fuzzy techniques for data management: modeling fuzzy data, querying fuzzy data, and fuzzy queries over crisp data, and provides a full up-to-date survey on the current state of the art in fuzzy data modeling and querying. The paper identifies fuzzy conceptual data models, fuzzy (relational and object-oriented) database models and fuzzy XML model as well as the relationships among these fuzzy data models. For each type of fuzzy data models, the paper summarizes its query processing. The paper also reviews fuzzy querying over classical data models. In addition to providing a generic overview of the approaches for fuzzy data modeling and querying, this survey paper serves for identifying possible research opportunities in the area of fuzzy data processing.
Article
When record sets become large, indexing becomes a required technique for speeding up querying. This paper proposes an indexing technique for interval data. Such data are common in possibility based relational databases but are also frequently used in other applications. Our approach is an adaptation of a B⁺-tree, which is currently still one of the most efficient indexing techniques. Because it can store interval data, we name it the Interval B⁺-tree (IBPT). It is illustrated how an IBPT index can be built and applied in practice to speed up the evaluation of fuzzy queries on possibilistic relational databases.
Article
It is widely known that the most effective way to implement a fuzzy database is to use a classical Relational Database Management System (RDBMS) as the basis. All these systems provide several kinds of indexing methods to improve the execution time of classical queries, but they are useless when directly applied to fuzzy queries. For this reason, in this work we propose and evaluate several fuzzy indexing techniques implemented over the indexing techniques available on classical RDBMS in order to enhance flexible queries when based on the necessity measure. As the results show, the best evaluated fuzzy indexing techniques can be implemented on top of classical RDBMS.
Article
A common way to implement a fuzzy database is on top of a classical relational database management systems (RDBMS). Given that almost all RDBMS provide indexing mechanisms to enhance classical query processing performance, finding ways to use these mechanisms to enhance the performance of flexible query processing is of enormous interest. This work proposes and evaluates a set of indexing strategies, implemented exclusively on top of classical RDBMS indexing structures, designed to improve flexible query processing performance, focusing in the case of possibilities queries. Results show the best indexing strategies for different data a query scenarios, offering effective ways to implement fuzzy data indexes on top of a classical RDBMS.
Conference Paper
This paper studies the influence of data distribution and clustering on the performance of currently available indexing methods, namely GT and HBPT, to solve necessity measured flexible queries on numerical imprecise data. The study of the above data scenarios lets to obtain valuable information about the expected performance of these indexes on real-world data and query sets, which are usually affected by different skew factors. Results reveal some sensibility of GT and no influence for the considered data scenarios on HBPT.
Article
Full-text available
Information technology is one of the most rapidly changing disciplines, especially with the fuzzy extension. Fuzzy databases have been studied in many works and papers but, in general, these works study some particular area and many works are theoretical works, with very few real applications. The Handbook of Research on Fuzzy Information Processing in Databases provides comprehensive coverage and definitions of the most important issues, concepts, trends, and technologies in fuzzy topics applied to databases, discussing current investigation into uncertainty and imprecision management by means of fuzzy sets and fuzzy logic in the field of databases and data mining. This compendium of research offers researchers, students, and organizations a complete, practical, guide to fuzzy information processing in databases.
Article
Full-text available
Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search. The premise “smaller is better” may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput. Examined here are five compression techniques, Golomb, Elias gamma, Elias delta, Variable Byte Encoding and Binary Interpolative Coding. The effect on file size, file seek time, and file read time are all measured as is decompression time. A quantitative measure of throughput is developed and the performance of each method is determined.
Article
Full-text available
The efficient retrieval of data items on set-valued attributes is an important research topic that has attracted little attention so far. We studied and modified four index structures (sequential signature files, signature trees, extendible signa- ture hashing, and inverted files) for a fast retrieval of sets with low cardinality. We compared the index structures by imple- menting them and subjecting them to extensive experiments, investigating the influence of query set size, database size, domain size, and data distribution (syntheticand real). The results of the experiments clearly indicate that inverted files exhibit the best overall behavior of all tested index structures. index structures, spatial index structures, join algorithms) to implement four index structures for set-valued attributes: se- quential signature files, signature trees, extendible signature hashing, and inverted files. We are looking for a single index structure that is capable of supporting queries involving equal, subset, and superset predicates efficiently. Some of the exist- ing index structures have been shown to excel at one or two of these query types but fail at the others. We investigate the index structures in view of overall performance for all men- tioned query types. An evaluation of these index structures is very complex since there is a huge number of parameters involved. The complexity of the task and the lack of theoretical tools for evaluating the performance of index structures motivated us to conduct extensive experiments. Having determined the eval- uation method, we decided to reveal implementation details of our index structures so that the results of the experiments would become more transparent.
Article
Full-text available
The database research with focus on integration of text, data, code, fusion of information from heterogeneous data sources, and information privacy, conducted at Lowell, is discussed. The object-oriented (OO) and object-relational (OR) database management systems (DBMS) showed how text and other data types can be added to a DBMS. Several goals mentioned in the Lowell meeting included the proposal to reconsider DBMS architecture to handle new data types, approximate reasoning, and treating procedures and data as co-equal. It was found that information integration research would be well served by generating a test bed and collection of integration tasks.
Article
Full-text available
A group of database researchers, architects, users, and pundits met in May 2008 at the Claremont Resort in Berkeley, CA, to discuss the state of database research and its effects on practice. This was the seventh meeting of this sort over the past 20 years and was distinguished by a broad consensus that the database community is at a turning point in its history, due toboth an explosion of data and usage scenarios and major shifts in computing hardware and platforms. Here, we explore the conclusions of this self-assessment. It is by definition somewhat inward-focused but may be of interest to the broader computing community as both a window into upcoming directions in database research and
Chapter
This chapter introduces a fuzzy object-relational database model including fuzzy extensions of the basic object-relational databases constructs, the user-defined data types, and the collection types. The fuzzy extensions of these constructs focus on two main flexible aspects, a way to flexibly compare complex data types and an extension of collection types allowing partial membership of its elements. Collection operators are also adapted to consider flexibly comparable domains for its elements. Such a fuzzy object-relational database model, and its implementation in a fuzzy object-relational database management system, provides an easy and effective way to manage a great amount of complex fuzzy data in object-relational databases for emerging fuzzy applications. As a sample of the proposal advantages, an application for dominant color based image retrieval, which is built on an object-relational database management system implementing the proposed fuzzy database model, is introduced. Purchase this chapter to continue reading all 27 pages >
Chapter
This chapter introduces a fuzzy object-relational database model including fuzzy extensions of the basic object-relational databases constructs, the user-defined data types, and the collection types. The fuzzy extensions of these constructs focus on two main flexible aspects, a way to flexibly compare complex data types and an extension of collection types allowing partial membership of its elements. Collection operators are also adapted to consider flexibly comparable domains for its elements. Such a fuzzy object-relational database model, and its implementation in a fuzzy object-relational database management system, provides an easy and effective way to manage a great amount of complex fuzzy data in object-relational databases for emerging fuzzy applications. As a sample of the proposal advantages, an application for dominant color based image retrieval, which is built on an object-relational database management system implementing the proposed fuzzy database model, is introduced.
Article
This paper proposes an indexing procedure for improving the performance of query processing on a fuzzy database. It focuses on the case when a necessity-measured atomic flexible condition is imposed on the values of a fuzzy numerical attribute. The proposal is to apply a classical indexing structure for numerical crisp data, a B+-tree combined with a Hilbert curve. The use of such a common indexing technique makes its incorporation into current systems straightforward. The efficiency of the proposal is compared with that of another indexing procedure for similar fuzzy data and flexible query types. Experimental results reveal that the performance of the proposed method is similar and more stable than that of its competitor.
Book
Fuzzy Databases: Modeling, Design and Implementation focuses on some semantic aspects which have not been studied in previous works and extends the EER model with fuzzy capabilities. The exposed model is called FuzzyEER model, and some of the studied extensions are: fuzzy attributes, fuzzy aggregations and different aspects on specializations, such as fuzzy degrees, fuzzy constraints, etc. All these fuzzy extensions offer greater expressiveness in conceptual design. Fuzzy Databases: Modeling, Design and Implementation proposes also a method to translate FuzzyEER model to a classical DBMS, and defines FSQL (Fuzzy SQL), an extension of the SQL language that allows users to write flexible conditions in queries, using all extensions defined by the FuzzyEER model. This book, while providing a global and integrated view of fuzzy database constructions, serves as an introduction to fuzzy logic, fuzzy databases and fuzzy modeling in databases.
Article
A structure for representing inexact information in the form of a relational database is presented. The structure differs from ordinary relational databases in two important respects: Components of tuples need not be single values and a similarity relation is required for each domain set of the database. Two critical properties possessed by ordinary relational databases are proven to exist in the fuzzy relational structure. These properties are (1) no two tuples have identical interpretations, and (2) each relational operation has a unique result.
Article
In this paper a fuzzy approach for image retrieval on the basis of color features is presented. The proposal deals with vagueness in the color description and introduces the use of fuzzy database models to store and retrieve imprecise data. To face the color description, the concept of dominant fuzzy color is proposed, using linguistic labels for representing the color information in terms of hue, saturation and intensity. To deal with fuzzy data in our database model, we use a general approach which can support the manipulation of fuzzy objects in an object-relational database system. This allows the retrieval of images by performing flexible queries on the database.
Article
This paper deals with relational databases which are extended in the sense that fuzzily known values are allowed for attributes. Precise as well as partial (imprecise, uncertain) knowledge concerning the value of the attributes are represented by means of [0,1]-valued possibility distributions in Zadeh's sense. Thus, we have to manipulate ordinary relations on Cartesian products of sets of fuzzy subsets rather than fuzzy relations. Besides, vague queries whose contents are also represented by possibility distributions can be taken into account. The basic operations of relational algebra, union, intersection, Cartesian product, projection, and selection are extended in order to deal with partial information and vague queries. Approximate equalities and inequalities modeled by fuzzy relations can also be taken into account in the selection operation. Then, the main features of a query language based on the extended relational algebra are presented. An illustrative example is provided. This approach, which enables a very general treatment of relational databases with fuzzy attribute values, makes an extensive use of dual possibility and necessity measures.
Article
Providing efficient query processing in database systems is one step towards gaining acceptance of such systems by end users. We propose several techniques for indexing fuzzy sets in databases to improve the query evaluation performance. Three of the presented access methods are based on superimposed coding, while the fourth relies on inverted files. The efficiency of these techniques was evaluated experimentally. We present results from these experiments, which clearly show the superiority of the inverted files.
Article
This paper proposes an indexing technique for fuzzy numerical data which increases the performance of query processing when the query involves an atomic possibility measured flexible condition. The proposal is based on a classical indexing mechanism for numerical crisp data, B+-tree, which is implemented in most commercial database management systems (DBMS). This makes the proposed technique a good candidate for integration in a fuzzy DBMS when it is developed as an extension of a crisp DBMS. The efficiency of the proposal is contrasted with another indexing method for similar data and queries, G-tree, which is specifically designed to index multidimensional data. Results show that the proposal performance is similar to and more stable than the measured for G-tree when used for indexing fuzzy numbers.
Article
Up to now, many theoretical works about fuzzy databases have been defined by designing some extension of the relational model. Our purpose is to discuss some implementation aspects related to these databases. This point of view does not seem to be a usual concern of such systems, whereas it is of prime importance with respect to their future performances. We focus on the evaluation of a class of queries, called mono-attribute restrictions. We show how some basic principles can be applied to improve such a “fuzzy associative” retrieval, by means of an indexing-like access method. Lastly, some implementation solutions are presented.
Conference Paper
The object-oriented data model has been utilized for geographical information applications, chiefly because of the rich set of modelling primitives it offers. Despite its representational capabilities, it still falls short in describing data typically seen in geographic information systems, i. e., imprecise, spatial or continuous valued data. In this paper, an extension to the object-oriented data model that permits the representation of imprecise data is discussed in the context of a soil information system. It is shown that this extension accommodates the querying and manipulation of spatial and continous valued data.
Conference Paper
The signature approach is an access method for partial-match retrieval which meets many requirements of an office environment. Signatures are hash coded binary words derived from objects stored in the data base. They serve as a filter for retrieval in order to discard a large number of nonqualifying objects. In an indexed signature method the signatures of objects stored on a single page are used to form a signature for that page. In this paper we describe a new technique of indexed signatures which combines the dynamic balancing of B-trees with the signature approach. The main problem of appropriate splitting is solved in a heuristic way. Operations are described and a simple performance analysis is given. The analysis and some experimental results indicate a considerable performance gain. Moreover, the new S-tree approach supports a clustering on a signature basis. Further remarks on adaptability complete this work.
Book
Recent years have seen an explosive growth in the use of new database applications such as CAD/CAM systems, spatial information systems, and multimedia information systems. The needs of these applications are far more complex than traditional business applications. They call for support of objects with complex data types, such as images and spatial objects, and for support of objects with wildly varying numbers of index terms, such as documents. Traditional indexing techniques such as the B-tree and its variants do not efficiently support these applications, and so new indexing mechanisms have been developed. As a result of the demand for database support for new applications, there has been a proliferation of new indexing techniques. The need for a book addressing indexing problems in advanced applications is evident. For practitioners and database and application developers, this book explains best practice, guiding the selection of appropriate indexes for each application. For researchers, this book provides a foundation for the development of new and more robust indexes. For newcomers, this book is an overview of the wide range of advanced indexing techniques. Indexing Techniques for Advanced Database Systems is suitable as a secondary text for a graduate level course on indexing techniques, and as a reference for researchers and practitioners in industry.
Article
This article is situated in the framework of the relational model of data and it focuses on two quite orthogonal issues, even if one may reasonably think that they can be mixed together. The first topic is concerned with non-traditional queries, called ...
Article
In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, ...
Article
A flexible model for evaluating soft query with unequal preferences in fuzzy databases is proposed. We assume that conditions with unequal preferences have an exclusive meaning like in the request “find a holiday accommodation such that big apartments ...
Book
Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Article
A group of database researchers, architects, users, and pundits met in May 2008 at the Claremont Resort in Berkeley, CA, to discuss the state of database it search and its effects on practice. This was the seventh meeting of this sort over the past 20 years and was distinguished by a broad consensus that the database community is at a turning point in its history, due to both an explosion of data and usage scenarios and major shifts in computing hardware and platforms. This article explores the conclusions of this self-assessment. The theme of the Claremont meeting was that database research and the data-management industry are at a turning point, with unusually rich opportunities for technical advances, intellectual achievement, entrepreneurship, and benefits for science and society. Given the large number of opportunities, it is important for the database research community to address issues that maximize relevance within the field, across computing, and in external fields as well.
Article
Object-oriented database systems (OODBs) need efficient support for manipulation of complex objects. In particular, support of queries involving evaluations of set predicates is often required in handling complex objects. In this paper, we propose a scheme to apply signature file techniques, which were originally invented for text retrieval, to the support of set value accesses, and quantitatively evaluate their potential capabilities. Two signature file organizations, the sequential signature file and the bitsliced signature file, are considered and their performance is compared with that of the nested index for queries involving the set inclusion operator (`). We develop a detailed cost model and present analytical results clarifying their retrieval, storage, and update costs. Our analysis shows that the bitsliced signature file is a very promising set access facility in OODBs. 1 INTRODUCTION Advanced database application areas, such as computer aided design, office automation, and...
Possibility Theory: An Approach to Computerized Processing of Uncertainty
  • D Dubois
  • H Prade
D. Dubois, H. Prade, Possibility Theory: An Approach to Computerized Processing of Uncertainty, Plenum Press, New York, 1988.
Fuzzy Databases: Principles and Applications, International Series in Intelligent Technologies
  • F E Petry
  • P Bosc
F.E. Petry, P. Bosc, Fuzzy Databases: Principles and Applications, International Series in Intelligent Technologies, Kluwer Academic Publishers, 1996.