An SGML document of type article IDREF (for the element referencing it). For instance, gures may be referenced in paragraphs (Figure 1 lines 12 and 18).

Source publication

From Structured Documents to Novel Query Facilities

Article

Full-text available

Apr 1996

Structured documents (e.g., SGML) can benefit a lot from database support and more specifically from object-oriented database (OODB) management systems. This paper describes a natural mapping from SGML documents into OODB's and a formal extension of two OODB query languages (one SQL-like and the other calculus) in order to deal with SGML document r...

Context 1

... order to deene a document's logical structure, SGML adds descriptive mark-up (tags) in document instances. Each SGML document has: (i) A prologue including a Document Type Deenition (DTD), i.e., a set of grammar rules specifying the document generic logical structure (see Figure 1); and (ii) a document instance containing the information content as well as the tags e.g., the speciic logical structure of the document (see Figure 2). ...

View in full-text

Effective Query Expansion for Federated Search

Conference Paper

Full-text available

Jul 2009

While query expansion techniques have been shown to im- prove retrieval performance in a centralized setting, they have not been well studied in a federated setting. In this paper, we consider how query expansion may be adapted to federated environments and propose several new methods: where focused expansions are used in a selective fashion to pro...

Prime Number-Based Hierarchical Data Labeling Scheme for Relational Databases

Article

Dec 2007

Serhiy Morozov

Hierarchical data structures are an important aspect of many computer science fields including data mining, terrain modeling, and image analysis. A good representation of such data accurately captures the parent-child and ancestor-descendent relationships between nodes. There exist a number of different ways to capture and manage hierarchical data while preserving such relationships. For instance, one may use a custom system designed for a specific kind of hierarchy. Object oriented databases may also be used to model hierarchical data. Relational database systems, on the other hand, add an additional benefit of mature mathematical theory, reliable implementations, superior functionality and scalability. Relational databases were not originally designed with hierarchical data management in mind. As a result, abstract information can not be natively stored in database relations. Database labeling schemes resolve this issue by labeling all nodes in a way that reveals their relationships. Labels usually encode the node's position in a hierarchy as a number or a string that can be stored, indexed, searched, and retrieved from a database. Many different labeling schemes have been developed in the past. All of them may be classified into three broad categories: recursive expansion, materialized path, and nested sets. Each model has its strengths and weaknesses. Each model implementation attempts to reduce the number of weaknesses inherent to the respective model. One of the most prominent implementations of the materialized path model uses the unique characteristics of prime numbers for its labeling purposes. However, the performance and space utilization of this prime number labeling scheme could be significantly improved. This research introduces a new scheme called reusable prime number labeling (rPNL) that reduces the effects of the mentioned weaknesses. The proposed scheme advantage is discussed in detail, proven mathematically, and experimentally confirmed.

Interoperability mapping from XML schemas to ER diagrams

Article

Oct 2006
DATA KNOWL ENG

The eXtensible Markup Language (XML) is a de facto standard on the Internet and is now being used to exchange a variety of data structures. This leads to the problem of efficiently storing, querying and retrieving a great amount of data contained in XML documents. Unfortunately, XML data often need to coexist with historical data. At present, the best solution for storing XML into pre-existing data structures is to extract the information from the XML documents and adapt it to the data structures’ logical model (e.g., the relational model of a DBMS). In this paper, we introduce a technique called Xere (XML entity–relationship exchange) to assist the integration of XML data with other data sources. To this aim, we present an algorithm that maps XML schemas into entity–relationship diagrams, discuss its soundness and completeness and show its implementation in XSLT.

Reuse of linked documents through virtual document prescriptions

Chapter

Aug 2006

As the WWW becomes a major source of information, a lot of interest has arisen, not only for searching for information, but for reusing this information in new pages, or directly within applications. Unfortunately HTML tags do not provide a significant level of structure for identifying and extracting information, since they are mostly used for presentation issues. Moreover the simple link mechanism of the Web does not support the controlled traversal of links to related pages. Particularly promising is the proposal for a new standard, XML, which could bring the power of SGML to the Web while keeping the simplicity of HTML. In this paper we present a system and a language that allow reusing of information from various sources, including databases and SGML-like documents, by combining it dynamically to produce a virtual document. The language uses a treelike structure for the representation of the information objects as well as link objects. The paper focuses on the selection and the traversal of XML links to extract information from linked pages. The strength of our approach is to be an SGML-compliant solution, which makes it ready to take full advantage of XML for reusing information from the Web as soon as it is widely used.

Accès transparent et sécurisé à des données largement distribuées

Article

Full-text available

Jul 2006

Béatrice Finance

Offrir un accès transparent et sécurisé à un ensemble de ressources passe par la définition de logiciels de médiation qui rendent la complexité de l'architecture sous-jacente transparente à l'utilisateur en offrant des facilités de conception, d'intégration, d'interrogation et d'administration permettant le partage de données et de programmes d'une manière fiable et efficace. Un très gros effort a été mené ces vingt dernières années pour aider à la mise en œuvre de ces logiciels de médiation, que cela soit par la communauté bases de données ou par la communauté systèmes distribués. Ces deux communautés adressent différemment le problème de l'accès transparent et sécurisé à des ressources largement distribuées, il est donc important aujourd'hui de comprendre la variabilité des solutions en termes de fonctionnalités, de design, d'algorithmes et d'architectures afin d'identifier les différentes dimensions du problème. Ce mémoire retrace l'ensemble de mes activés de recherche réalisées à partir de 1992 au sein du thème SBD (Systèmes et Bases de Données) du laboratoire PRISM de l'Université de Versailles-St-Quentin, et depuis 2002, dans le cadre du projet SMIS (Secure & Mobile Information System) à l'INRIA. Il détaille les aspects pluridisciplinaires de l'accès transparent aux données comme la conception et le développement de systèmes de médiation de données pour l'interopérabilité de bases de données relationnelles, objets et XML, et la problématique d'accès aux données dans les systèmes à objets distribués à l'aide d'annuaires. Il aborde également la problématique d'accès sécurisé aux données, notamment dans un contexte XML. Ce mémoire détaille l'ensemble de mes contributions scientifiques autour de ces trois thèmes, indique pour chacun d'eux le contexte historique de l'époque et le replace vis-à-vis de l'existant. Pour conclure, il liste un ensemble de perspectives et directions de recherche à la lueur de mon expérience pluridisciplinaire et du constat de l'adoption croissante de XML comme fondation technologique pour l'accès transparent et sécurisé aux données largement distribuées.

Queries and computation on the Web

Chapter

Full-text available

Apr 2006

The paper introduces a model of the Web as an infinite, semi-structured set of objects. We reconsider the classical notions of genericity and computability of queries in this new context and relate them to styles of computation prevalent on the Web, based on browsing and searching. We revisit several well-known declarative query languages (first-order logic, Datalog, and Datalog with negation) and consider their computational characteristics in terms the notions introduced in this paper. In particular, we are interested in languages or fragments thereof which can be implemented by browsing, or by browsing and searching combined. Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative.

Querying the World Wide Web

Article

Nov 2003
Int J Digit Libr

The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers" that index as many documents as they can find by navigating the network. One problem with this is that users must be aware of the various index servers (over a dozen of them are currently deployed on the Web), of their strengths and weaknesses, and of the peculiarities of their query interfaces. A more serious problem is that these queries cannot exploit the structure and topology of the document network.

Solving Schema Conversion Problem between XML and Relational Models: Semantic Approach

Article

Full-text available

Jul 2003

Schema conversion problem aims to convert a source schema S in the given model M 1 to an equiv-alent target schema T in the desired model M 2 . In this paper, we especially study schema conversion problem between XML and relational models. We present three semantics-based schema conversion al-gorithms: 1) CPI converts an XML schema to a relational schema while preserving semantic constraints of the original XML schema, 2) NeT derives a nested structured XML schema from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchi-cal, and 3) CoT takes a relational schema as input, where multiple tables are interconnected through inclusion dependencies and generates an equivalent XML schema as output.

ρ-Queries: Enabling querying for semantic associations on the semantic web

Conference Paper

Full-text available

May 2003

This paper presents the notion of Semantic Associations as complex relationships between resource entities. These relationships capture both a connectivity of entities as well as similarity of entities based on a specific notion of similarity called r-isomorphism. It formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type. In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures. Specifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of r-isomorphic Property Sequences.The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF.

XRANK: ranked keyword search over XML documents

Article

Full-text available

Apr 2003

We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.

Xere: Towards a Natural Interoperability between XML and ER Diagrams

Conference Paper

Full-text available

Apr 2003
Lect Notes Comput Sci

XML (eXtensible Markup Language) is becoming the stan- dard format for documents on Internet and is widely used to exchange data. Often, the relevant information contained in XML documents needs to be also stored in legacy databases (DB) in order to integrate the new data with the pre-existing ones. In this paper, we introduce a technique for the automatic XML-DB integration, which we call Xere. In particular we present, as the first step of Xere, the mapping algorithm which allows the translation of XML Schemas into Entity-Relationship diagrams.

An SGML document of type article IDREF (for the element referencing it). For instance, gures may be referenced in paragraphs (Figure 1 lines 12 and 18).

Context in source publication

Similar publications

Citations