Conference Paper

Faceted Taxonomy-based Information Management

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Faceted indexing and searching are being increasingly studied in the literature and used for real-life applications, e.g., for publishing heterogeneous museum collections on the Web. In this paper, we discuss in brief several aspects of managing (faceted) taxonomy-based information sources. Specifically, we discuss (i) the semantic description of faceted taxonomies, based on the compound term composition algebra (CTCA), (ii) the revision of CTCA expressions, as faceted taxonomies evolve, (Hi) the dynamic generation of navigational trees (and other applications of CTCA), and (iv) the integration and personalization of taxonomy-based sources.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Other collected studies presented an overview of FS. The research in the library of "futuregeneration" catalogs that combine FS outcomes was later evaluated based on the questions of what is known by now regarding FS and the way to design improved research for FS in library catalogs [37][38][39][40][41]; 4. Faceted classification: These analyzed the interface that enables faster and easier access to the required information. The articles [42,43] discussed six main facets of searches: query sessions, space, user attitude, technical requirements, space of contents, and user racial background. ...
Article
Full-text available
In modern society, the increasing number of web search operations on various search engines has become ubiquitous due to the significant number of results presented to the users and the incompetent result-ranking mechanism in some domains, such as medical, law, and academia. As a result, the user is overwhelmed with a large number of misranked or uncategorized search results. One of the most promising technologies to reduce the number of results and provide desirable information to the users is dynamic faceted filters. Therefore, this paper extensively reviews related research articles published in IEEE Xplore, Web of Science, and the ACM digital library. As a result, a total of 170 related research papers were considered and organized into five categories. The main contribution of this paper is to provide a detailed analysis of the faceted search’s fundamental attributes, as well as to demonstrate the motivation from the usage, concerns, challenges, and recommendations to enhance the use of the faceted approach among web search service providers.
... Tzitzikas and Analyti [179] The various control directions in FS taxonomy-based data sources have been reviewed. We specifically described: (a) a compound term composition algebra of FS semantonomic information sources; (b) established and studied the production of FS taxonomies and expressions of the compound term algebra composition. ...
Article
Full-text available
In the modern society, Internet provides massive amounts of heterogeneous information, hence Information overload has become an ubiquitous issue. In this paper, we conduct a large scale quantitative study for articles dealing with (1) information overloading; (2) faceted search; and (3) filtering the data in three major databases, namely, Web of Science, ScienceDirect, and IEEE Explore. These three databases have presented 172 articles, which can be classified into four categories. The first category contains review and survey papers related to information overload. The second category includes papers that concentrate on developing theoretical frameworks to reduce information overloading. The third category contains papers dealing with improving structure or architectural of software for filtering the huge data. The fourth category includes papers that provide criteria to evaluate filtering techniques. Finally, our contribution provides further understanding of information overload, and gives an important basis for future research. Moreover, we illustrate that the dynamic faceted filters are more efficient to reduce the information overload.
Chapter
The extensive amount of results obtained from any Web search operation and loads of related and/or irrelevant hits presented on the user’s screen are still poses challenges in the information retrieval field of study; especially if the user is an academic researcher and is looking for reliable and focused results. Therefore, improving the performance of Web search engines continues to be an active research topic. One of the biggest challenges to search engine optimization is when a user submits incomplete query statements or fragmented keywords. Using broken or fragmented keywords the semantic correlation will fail to result in inconsistent and outsized search results. This oversized (or overloaded) problem can be mitigated by utilizing the Exploratory Search technique with a faceted search refining mechanism. This study’s main goal is to present a short review of the existing Exploratory Search techniques and faceted search implementations and shed light on the main limitations and shortcomings.
Article
The rapid development of information technology in recent years has caused a severe increase in the amount of communication between citizens and government, which led to a considerable upsurge in the volume and complexity of information in eGovernment systems. As a result, structuring information has become a critical topic. When considering structuring information in eGovernment systems, it is important to have in mind two very important processes in the work of eGovernment: analyzing current situation and decision making. For successful completion of these processes, holistic understanding of the situation is necessary. This paper proposes a model that will ease the process of analyzing current situation and the process of decision-making, by providing a holistic view of the situation to the employees in eGovernment. The model effectively manages information and endows decision makers, and other employees in eGovernment, with relevant pieces of information. It also facilitates the understanding of relationships between different pieces of information in the system. The model is based on faceted taxonomy. Faceted taxonomy represents a set of taxonomies, each one describing the domain of interest from a different, preferably orthogonal, point of view. Faceted taxonomy allows rich information structuring and allows users to easily correlate concepts and explore correlations between these concepts. This approach can also provide an intuitive, hierarchical, visual representation of relations between concepts. Visual representations can help better understanding complex topics. The model provides a strong foundation for further development of visualizing techniques. Using this model can also improve all search and browsing methods on the information in the system, and allow easy transformations on the structure. This paper clarifies the criteria selection process for the taxonomies used in the model and provides guidelines for practical implementation of the model. The model also offers recommendations for further exploration of this field.
Book
Full-text available
This book offers an approach to Drupal as a tool for librairies, and covers different uses of Drupal on libraries, archives and information services.
Conference Paper
Full-text available
There are currently two dominant interface types for searching and browsing large image collections: keyword-based search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata and dynamically generated query previews, A usability study, in which 32 art history students explored a collection of 35,000 fine arts images, compares this approach to a standard image search interface. Despite the unfamiliarity and power of the interface (attributes that often lead to rejection of new search interfaces), the study results show that 90% of the participants preferred the metadata approach overall, 97% said that it helped them learn more about the collection, 75% found it more flexible, and 72% found it easier to use than a standard baseline system. These results indicate that a category-based approach is a successful way to provide access to image collections.
Conference Paper
Full-text available
This paper considers Peer-to-Peer systems in which peers employ taxonomies for describing the contents of their objects and for formulating semantic-based queries to the other peers of the system. As each peer can use its own taxonomy, peers are equipped with inter-taxonomy mappings in order to carry out the required translation tasks. As these systems are ad-hoc, the peers should be able to create or revise these mappings on demand and at run-time. For this reason, we introduce an ostensive data-driven method for automatic mapping and specialize it for the case of taxonomies.
Article
Full-text available
We propose a mediator model for providing integrated and unified access to multiple taxonomy-based sources. Each source comprises a taxonomy and a database that indexes objects under the terms of the taxonomy. A mediator comprises a taxonomy and a set of relations between the mediators and the sources terms, called articulations. By combining different modes of query evaluation at the sources and the mediator and different types of query translation, a flexible, efficient scheme of mediator operation is obtained that can accommodate various application needs and levels of answer quality. We adopt a simple conceptual modeling approach (taxonomies and intertaxonomy mappings) and we illustrate its advantages in terms of ease of use, uniformity, scalability, and efficiency. These characteristics make this proposal appropriate for a large-scale network of sources and mediators.
Article
Full-text available
This article presents the semantic portal MuseumFinland for publishing heterogeneous museum collections on the Semantic Web. It is shown how museums with their semantically rich and interrelated collection content can create a large, consolidated semantic collection portal together on the web. By sharing a set of ontologies, it is possible to make collections semantically interoperable, and provide the museum visitors with intelligent content-based search and browsing services to the global collection base. The architecture underlying MuseumFinland separates generic search and browsing services from the underlying application dependent schemas and metadata by a layer of logical rules. As a result, the portal creation framework and software developed has been applied successfully to other domains as well. MuseumFinland got the Semantic Web Challence Award (second prize) in 2004.
Conference Paper
Full-text available
Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Interfaces that use multifaceted hierarchies represent a new powerful browsing paradigm which has been proven to be a successful complement to keyword searching. Thus far, multifaceted hierarchies have been created manually or semi-automatically, making it difficult to deploy multifaceted interfaces over a large number of databases. We present automatic and scalable methods for creation of multifaceted interfaces. Our methods are integrated with traditional relational databases and can scale well for large databases. Furthermore, we present methods for selecting the best portions of the generated hierarchies when the screen space is not sufficient for displaying all the hierarchy at once. We apply our technique to a range of large data sets, including annotated images, television programming schedules, and web pages. The results are promising and suggest directions for future research.
Conference Paper
Full-text available
In this study we address the problem of answering queries over information sources storing objects which are indexed by terms arranged in a taxonomy. We examine query languages of different expressivity and sources with different kinds of taxonomies. In the simplest kind, the taxonomy includes just term-to-term subsumption links. This case is used as a basis for further developments, in which we consider taxonomies consisting of term-to-queries links. An algorithm for query evaluation is presented for this kind of taxonomies, and it is shown that the addition of negation to the query language leads to intractability. Finally, query-to-query taxonomies are considered.
Conference Paper
Full-text available
compound terms (conjunctions of terms) over thefaceted taxonomy. Faceted taxonomies carry a number of well known advantagesover single hierarchies in terms of building and maintaining them, as well asusing them in multicriteria indexing (e.g. see [3]). FASTAXON is a system forbuilding big (compound) taxonomies based on the above mentioned idea. Usingthe system, the designer at first defines a number of facets and assigns to eachone of them one taxonomy. After that the system can generate...
Article
Full-text available
Although symbolic data tables summarize huge sets of data they can still become very large in size. This paper proposes a method for compressing a symbolic data table using the recently emerged Compound Term Composition Algebra. One charisma of CTCA is that the closed world hypotheses of its operations can lead to a remarkably high "compression ratio". The compacted form apart from having much lower storage space requirements, it allows designing more ecient algorithms for symbolic data analysis.
Article
Full-text available
A materialized faceted taxonomy is an information source where the objects of interest are indexed according to a faceted taxonomy. This paper shows how from a materialized faceted taxonomy, we can mine an expression of the Compound Term Composition Algebra that specifies exactly those com- pound terms (conjunctions of terms) that have non-empty interpretation. The mined expressions can be used for encoding in a very compact form (and subse- quently reusing), the domain knowledge that is stored in existing materialized faceted taxonomies. A distinctive characteristic of this mining task is that the focus is given on minimizing the storage space requirements of the mined set of compound terms. This paper formulates the problem of expression mining, gives several algorithms for expression mining, analyzes their computational complexity, provides techniques for optimization, and discusses several novel applications that now become possible.
Article
Full-text available
A faceted taxonomy is a set of taxonomies each describing the application domain from a difierent (preferably orthogonal) point of view. CTCA is an algebra that allows specifying the set of meaningful compound terms (meaningful conjunctions of terms) over a faceted taxonomy in a ∞exible and e-cient manner. However, taxonomy updates may turn a CTCA expression e not well-formed and may turn the compound terms specifled by e to no longer re∞ect the domain knowledge originally expressed in e. This paper shows how we can revise e after a taxonomy update and reach an expression e0 that is both well-formed and whose semantics (compound terms deflned) is as close as possible to the semantics of the original expression e before the update. Various cases are analyzed and the revising algorithms are given. The proposed technique can enhance the robustness and usability of systems that are based on CTCA and allows optimizing several other tasks where CTCA can be used (including mining and compressing).
Article
Full-text available
This paper considers Peer-to-Peer systems in which peers employ taxonomies for describing the contents of their objects and for formulating semantic-based queries to the other peers of the system.
Conference Paper
Full-text available
Faceted classification allows one to model applications with complex classification hierarchies using orthogonal dimensions. Recent work has examined the use of faceted classification for browsing and search. In this paper, we go further by developing a general query language, called the entity algebra, for hierarchically classified data. The entity algebra is compositional, with query inputs and outputs being sets of entities. Our language has linear data complexity in terms of space and quadratic data complexity in terms of time. We compare the entity algebra with the relational algebra in terms of expressiveness. We also describe an implementation of the language in the context of two application domains, one for an archeological database, and another for a human anatomy database.
Article
Full-text available
There are currently two dominant interface types for searching and browsing large image collections: keywordbased search, and searching by overall similarity to sample images. We present an alternative based on enabling users to navigate along conceptual dimensions that describe the images. The interface makes use of hierarchical faceted metadata and dynamically generated query previews. A usability study, in which 32 art history students explored a collection of 35,000 fine arts images, compares this approach to a standard image search interface. Despite the unfamiliarity and power of the interface (attributes that often lead to rejection of new search interfaces), the study results show that 90% of the participants preferred the metadata approach overall, 97% said that it helped them learn more about the collection, 75% found it more flexible, and 72% found it easier to use than a standard baseline system. These results indicate that a category-based approach is a successful way to provide access to image collections.
Article
A faceted taxonomy is a set of taxonomies, each describing a given knowledge domain from a different aspect. The indexing of the domain objects is done using compound terms, i.e. conjunctive combinations of terms from the taxonomies. A faceted taxonomy has several advantages over a single taxonomy, including conceptual clarity, compactness, and scalability. A drawback, however, is the cost of identifying compound terms that are invalid, i.e. terms that do not apply to any object of the domain. This need arises both in indexing and retrieval, and involves considerable human effort for specifying the valid compound terms one by one. In this paper, we propose and present in detail an algebra which can be used to specify the set of valid compound terms in an efficient and flexible manner. It works on the basis of the original simple terms of the facets and a small set of positive and/or negative statements. In each algebraic operation, we adopt a closed-world assumption with respect to the declared positive or negative statements. In this paper we elaborate on the properties of the algebraic operators and we describe application and methodological issues.
Conference Paper
Governments, especially local ones, are using the web to provide a number of services that are mainly informative and aim at improving the quality of life of citizens and at promoting the local community “abroad”. These services include among others, job placement services, tourist information (hotels, restaurants, etc.), yellow pages to promote local industries and activities, and are supplied in addition to institutional services such as law, regulations and opportunities information bases. We argue that traditional methods commonly used by administrations to implement these services do not really work, and propose a new access paradigm based on conceptual manipulation. This paradigm is applied to a job placement example.
Conference Paper
We describe Castanet, an algorithm for auto- matically generating hierarchical faceted meta- data from textual descriptions of items, to be in- corporated into browsing and navigation inter- faces for large information collections. From an existing lexical database (such as WordNet), Castanet carves out a structure that reflects the contents of the target information collec- tion; moderate manual modifications improve the outcome. The algorithm is simple yet ef- fective: a study conducted with 34 information architects finds that Castanet achieves higher quality results than other automated category creation algorithms, and 85% of the study par- ticipants said they would like to use the system for their work.
Article
Commonly, for retrieving the desired information from an information source (knowledge base or information base), the user has to use the query language that is provided by the system. This is a big barrier for many ordinary users and the resulting interaction style is rather inflexible. In this paper we give the theoretical foundations of an interaction scheme that allows users to retrieve the objects of interest without having to be familiar with the conceptual schema of the source or with the supported query language. Specifically, we describe an interaction manager that provides a quite flexible interaction scheme by unifying several well-known interaction schemes. Furthermore, we show how this scheme can be applied to taxonomy-based sources by providing all needed algorithms and reporting their computational complexity.
Article
The central component of the technology reported in this article is a software reuse library organized around a faceted classification scheme. The system supports search and retrieval of reusable components and librarian functions such as cataloging and classification. To be effective, the system must operate within the context of an organizational infrastructure aimed at promoting reusability. Definition, implementation, and management of such infrastructrue is considered part of the technology. The first part of this article introduces a faceted-based library system and reports on the experiences with a first prototype. It includes justification for using faceted classification and discusses the need for librarian and organizational support. The second part reports on the deployment of reuse library technology.
Conference Paper
E-commerce is one of the most active and important Internet application areas, yet selecting a product to buy is normally quite a frustrating experience. In this paper, we identify the principal user tasks: the thinning-game and the end game. The thinning-game is used to find a suitably small set of candidate items on the basis of personal specifications. The end game is used to compare the features of a set of candidate items in order to find a single "right" item to purchase. We derive requirements for these two tasks and use them to review traditional techniques and to propose efficient solutions. The thinning game is solved by dynamic taxonomies, a powerful knowledge management model that also provides for multilingual access and easy user preference tracking. For the end game, which is inherently an information presentation problem, a color-coding scheme is used.
Conference Paper
Experience with the development, implementation, and deployment of reuse library technology is reported. The focus is on organizing software collections for reuse using faceted classifications. Briefly described are the successfully GTE Data Services' Asset Management Program and the steps taken at Contel for furthering reuse technology. The technology developed for reuse libraries is presented, followed by a description of how it was transferred. The experience described indicates that reuse library technology is available and transferable, and that it definitely has a positive financial impact on the organization implementing it
The Colon Classification
  • S R Ranganathan
  • S Artandi
FASTAXON: A system for FAST (and Faceted) TAXONomy design
  • Y Tzitzikas
  • R Launonen
  • M Hakkarainen
  • P Kohonen
  • T Leppanen
  • E Simpanen
  • H Tornroos
  • P Uusitalo
  • P Vanska
Y. Tzitzikas, R. Launonen, M. Hakkarainen, P. Kohonen, T. Leppanen, E. Simpanen, H. Tornroos, P. Uusitalo, and P. Vanska. "FASTAXON: A system for FAST (and Faceted) TAXONomy design.". In Procs. of 23th Int. Conf. on Conceptual Modeling (ER'2004), 2004. (an on-line demo is available at http://fastaxon.erve.vtt.fi/).