Figure 2 - uploaded by Serge Abiteboul
Content may be subject to copyright.
An SGML document of type article IDREF (for the element referencing it). For instance, gures may be referenced in paragraphs (Figure 1 lines 12 and 18).

An SGML document of type article IDREF (for the element referencing it). For instance, gures may be referenced in paragraphs (Figure 1 lines 12 and 18).

Source publication
Article
Full-text available
Structured documents (e.g., SGML) can benefit a lot from database support and more specifically from object-oriented database (OODB) management systems. This paper describes a natural mapping from SGML documents into OODB's and a formal extension of two OODB query languages (one SQL-like and the other calculus) in order to deal with SGML document r...

Context in source publication

Context 1
... order to deene a document's logical structure, SGML adds descriptive mark-up (tags) in document instances. Each SGML document has: (i) A prologue including a Document Type Deenition (DTD), i.e., a set of grammar rules specifying the document generic logical structure (see Figure 1); and (ii) a document instance containing the information content as well as the tags e.g., the speciic logical structure of the document (see Figure 2). ...

Similar publications

Conference Paper
Full-text available
While query expansion techniques have been shown to im- prove retrieval performance in a centralized setting, they have not been well studied in a federated setting. In this paper, we consider how query expansion may be adapted to federated environments and propose several new methods: where focused expansions are used in a selective fashion to pro...

Citations

... A table may be created for every level in a hierarchy and then easily searched using existing one-to-many relationships between the tables. This technique may be used to create an entire database schema based on document type definition (DTD) to store documents of previously known structure (Shanmugasundaram et al. 1999, Christophides, Abiteboul, Cluet, & Scholl 1994). This approach is clearly the fastest, because it takes full advantage of the database optimization algorithms, indexes, etc. ...
Article
Hierarchical data structures are an important aspect of many computer science fields including data mining, terrain modeling, and image analysis. A good representation of such data accurately captures the parent-child and ancestor-descendent relationships between nodes. There exist a number of different ways to capture and manage hierarchical data while preserving such relationships. For instance, one may use a custom system designed for a specific kind of hierarchy. Object oriented databases may also be used to model hierarchical data. Relational database systems, on the other hand, add an additional benefit of mature mathematical theory, reliable implementations, superior functionality and scalability. Relational databases were not originally designed with hierarchical data management in mind. As a result, abstract information can not be natively stored in database relations. Database labeling schemes resolve this issue by labeling all nodes in a way that reveals their relationships. Labels usually encode the node's position in a hierarchy as a number or a string that can be stored, indexed, searched, and retrieved from a database. Many different labeling schemes have been developed in the past. All of them may be classified into three broad categories: recursive expansion, materialized path, and nested sets. Each model has its strengths and weaknesses. Each model implementation attempts to reduce the number of weaknesses inherent to the respective model. One of the most prominent implementations of the materialized path model uses the unique characteristics of prime numbers for its labeling purposes. However, the performance and space utilization of this prime number labeling scheme could be significantly improved. This research introduces a new scheme called reusable prime number labeling (rPNL) that reduces the effects of the mentioned weaknesses. The proposed scheme advantage is discussed in detail, proven mathematically, and experimentally confirmed.
... The first approach, namely the structure-mapping approach, creates relational schemas based on the structure of XML documents (deduced from their DTD, if available). Basically, with this approach a relation is created for each element type in the XML documents, [Christophides et al. 1994; Abiteboul et al. 1997], and a database schema is defined for each XML document structure or DTD. This is the approach we used in the Xere algorithm. ...
Article
The eXtensible Markup Language (XML) is a de facto standard on the Internet and is now being used to exchange a variety of data structures. This leads to the problem of efficiently storing, querying and retrieving a great amount of data contained in XML documents. Unfortunately, XML data often need to coexist with historical data. At present, the best solution for storing XML into pre-existing data structures is to extract the information from the XML documents and adapt it to the data structures’ logical model (e.g., the relational model of a DBMS). In this paper, we introduce a technique called Xere (XML entity–relationship exchange) to assist the integration of XML data with other data sources. To this aim, we present an algorithm that maps XML schemas into entity–relationship diagrams, discuss its soundness and completeness and show its implementation in XSLT.
... In the example in figure 2, expressions like body.table[#FIRST+1] and $i.name are path expressions; they perform selections on the results in an OQLlike manner (our syntax is inspired by the language discribed in [10], now named POQL). A path expression can be seen as the traversal of a tree: an expression .L (dot selection) finds the children with label L one level down the tree, an expression ..L (dot-dot selection) finds the children with label L at any level down the tree. ...
... Other works have looked at querying structured or semi-structured information, either from a database point of view [3, 10]; from a document processing point of view, as with HyQ [2] or SgmlQL [15]; or for hypertexts [8, 7, 18]. Our selection of objects is similar to the generalized path expressions found in POQL [10] or Gram [7] ; however, their interpretation is much simplified since our paths are concrete paths into untyped tree structures that are not constrained to a database schema. ...
... Other works have looked at querying structured or semi-structured information, either from a database point of view [3, 10]; from a document processing point of view, as with HyQ [2] or SgmlQL [15]; or for hypertexts [8, 7, 18]. Our selection of objects is similar to the generalized path expressions found in POQL [10] or Gram [7] ; however, their interpretation is much simplified since our paths are concrete paths into untyped tree structures that are not constrained to a database schema. This need to query semi-structured data without strict typing conventions has been recognized in Lorel [5] , which uses coercion between types to address the problem . ...
Chapter
As the WWW becomes a major source of information, a lot of interest has arisen, not only for searching for information, but for reusing this information in new pages, or directly within applications. Unfortunately HTML tags do not provide a significant level of structure for identifying and extracting information, since they are mostly used for presentation issues. Moreover the simple link mechanism of the Web does not support the controlled traversal of links to related pages. Particularly promising is the proposal for a new standard, XML, which could bring the power of SGML to the Web while keeping the simplicity of HTML. In this paper we present a system and a language that allow reusing of information from various sources, including databases and SGML-like documents, by combining it dynamically to produce a virtual document. The language uses a treelike structure for the representation of the information objects as well as link objects. The paper focuses on the selection and the traversal of XML links to extract information from linked pages. The strength of our approach is to be an SGML-compliant solution, which makes it ready to take full advantage of XML for reusing information from the Web as soon as it is widely used.
... DQL étend les capacités de LDAP avec la recherche d'expressions régulières [53,52]. Notons que la recherche d'expressions régulières est une fonctionnalité déjà offerte par les langages XPath [155] et XQuery [156] pour l'interrogation de documents XML, qui eux-mêmes se sont appuyés sur les travaux réalisés autour des extensions proposées autour de OQL[32] pour le support de documents semi-structurés [5,33,34]. Dans ce chapitre, nous avons choisi de décrire le service d'annuaires étendu que nous avons défini pour le monde CORBA, ainsi que son langage d'interrogation DQL et son modèle d'exécution associé. Nous comparons nos travaux avec des approches similaires, puis nous concluons. ...
... Les extensions que nous avons proposées avec la définition de variables de chemin et de sélection permettent de filtrer un ensemble d'entrées sur leur lien de contenance et de garder dans le résultat la sélection les liens de contenance qui les unissaient. Ces extensions sont similaires aux extensions proposées dans OQL pour le support des documents semi-structurés [5,33,34]. DQL reste un langage simple à utiliser et propose un compromis pragmatique entre XPath et XQuery adapté au contexte des annuaires. en une seule requête, et ce même si cette requête met en jeu des données réparties sur plusieurs sites distincts. ...
Article
Full-text available
Offrir un accès transparent et sécurisé à un ensemble de ressources passe par la définition de logiciels de médiation qui rendent la complexité de l'architecture sous-jacente transparente à l'utilisateur en offrant des facilités de conception, d'intégration, d'interrogation et d'administration permettant le partage de données et de programmes d'une manière fiable et efficace. Un très gros effort a été mené ces vingt dernières années pour aider à la mise en œuvre de ces logiciels de médiation, que cela soit par la communauté bases de données ou par la communauté systèmes distribués. Ces deux communautés adressent différemment le problème de l'accès transparent et sécurisé à des ressources largement distribuées, il est donc important aujourd'hui de comprendre la variabilité des solutions en termes de fonctionnalités, de design, d'algorithmes et d'architectures afin d'identifier les différentes dimensions du problème. Ce mémoire retrace l'ensemble de mes activés de recherche réalisées à partir de 1992 au sein du thème SBD (Systèmes et Bases de Données) du laboratoire PRISM de l'Université de Versailles-St-Quentin, et depuis 2002, dans le cadre du projet SMIS (Secure & Mobile Information System) à l'INRIA. Il détaille les aspects pluridisciplinaires de l'accès transparent aux données comme la conception et le développement de systèmes de médiation de données pour l'interopérabilité de bases de données relationnelles, objets et XML, et la problématique d'accès aux données dans les systèmes à objets distribués à l'aide d'annuaires. Il aborde également la problématique d'accès sécurisé aux données, notamment dans un contexte XML. Ce mémoire détaille l'ensemble de mes contributions scientifiques autour de ces trois thèmes, indique pour chacun d'eux le contexte historique de l'époque et le replace vis-à-vis de l'existant. Pour conclure, il liste un ensemble de perspectives et directions de recherche à la lueur de mon expérience pluridisciplinaire et du constat de l'adoption croissante de XML comme fondation technologique pour l'accès transparent et sécurisé aux données largement distribuées.
... The data model we use is similar to several models for unstructured data recently introduced, e.g., [6,8,22]. The Web consists of an inÿnite set of objects. ...
... Query languages for the Web have attracted much attention recently, e.g., W3QL [17] that focuses on extensibility, WebSQL [19] that provides a formal semantics and introduce a notion of locality, or WebLog [18] that is based on a Datalog-like syntax. Since HTML (the core structure of the Web) can be viewed as an instance of SGML, the work on querying structured document, e.g., [8,12] is also pertinent, along with work on querying semistructured data (see [1] for a survey). The work on query languages for hypertext structures, e.g., [9,20,21] is also relevant. ...
Chapter
Full-text available
The paper introduces a model of the Web as an infinite, semi-structured set of objects. We reconsider the classical notions of genericity and computability of queries in this new context and relate them to styles of computation prevalent on the Web, based on browsing and searching. We revisit several well-known declarative query languages (first-order logic, Datalog, and Datalog with negation) and consider their computational characteristics in terms the notions introduced in this paper. In particular, we are interested in languages or fragments thereof which can be implemented by browsing, or by browsing and searching combined. Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative.
... There has been work in query languages for hypertext documents (Beeri and Kornatzky 1990;Consens and Mendelzon 1989;Minohara and Watanabe 1993) as well as query languages for structured or semi-structured documents (Abiteboul et al. 1993;Christophides et al. 1994;GuÈ ting et al. 1989;Navarro and Baeza-Yates 1995;Quass et al. 1995). Our work diers signi®cantly from both these streams. ...
... One of the diculties in building an SQL-like query language for the Web is the absence of a database schema. Instead of trying to model document structure with some kind of object-oriented schema, as in Christophides et al. (1994); Quass et al. (1995), we take a minimalist relational approach. At the highest level of abstraction, every Web object is identi®ed by its Uniform Resource Locator (URL) and has a binary content whose interpretation depends on its type (HTML, Postscript, image, audio, etc.). ...
... For example, we might be interested in links pointing to nodes in Canada such that their labels do not contain the strings``Back'' or``Home.'' Second, we would like to make use of internal document structure when it is known, along the lines of Christophides et al. (1994) and Quass et al. (1995). ...
Article
The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers" that index as many documents as they can find by navigating the network. One problem with this is that users must be aware of the various index servers (over a dozen of them are currently deployed on the Web), of their strengths and weaknesses, and of the peculiarities of their query interfaces. A more serious problem is that these queries cannot exploit the structure and topology of the document network.
... Conversion between different models has been extensively investigated. For instance, [19] deals with conversion problems in the OODB area; since OODB is a richer environment than RDB, their work is not readily applicable to our application. The logical database design methods and their associated conversion techniques to other data models have been extensively studied in ER research. ...
Article
Full-text available
Schema conversion problem aims to convert a source schema S in the given model M 1 to an equiv-alent target schema T in the desired model M 2 . In this paper, we especially study schema conversion problem between XML and relational models. We present three semantics-based schema conversion al-gorithms: 1) CPI converts an XML schema to a relational schema while preserving semantic constraints of the original XML schema, 2) NeT derives a nested structured XML schema from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchi-cal, and 3) CoT takes a relational schema as input, where multiple tables are interconnected through inclusion dependencies and generates an equivalent XML schema as output.
... Although, these systems provide powerful and expressive capability, allowing users to query for data without having in-depth schema knowledge, most of them work on the premise that the goal of a query is to find data entities but not complex relationships such as Semantic Associations. Some of these systems [19][22] support paths as first class entities and allow for path variables to be used outside of the FROM clause, i.e. to be returned as a result of a query which suggests that queries for ρ-pathAssociations could be supported. However, they typically assume a simpler data model which is a rooted directed graph without the nuances of RDF such as multiple classification and property hierarchies. ...
... However, they typically assume a simpler data model which is a rooted directed graph without the nuances of RDF such as multiple classification and property hierarchies. Furthermore, the more complex Semantic Associations such as the ρ-joinAssociation and ρ-Isomorphism are not supported, even in systems like [22] which provide some functions that range over path variables, e.g., the difference function which returns the difference in the set of paths that originate from two nodes. ...
Conference Paper
Full-text available
This paper presents the notion of Semantic Associations as complex relationships between resource entities. These relationships capture both a connectivity of entities as well as similarity of entities based on a specific notion of similarity called r-isomorphism. It formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type. In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures. Specifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of r-isomorphic Property Sequences.The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF.
... For example, consider again the query 'XQL language'. Although the <paper> element in lines 5-24 contains a sub-element <body> (lines [11][12][13][14][15][16][17][18][19][20][21][22][23] that contains all of the query keywords, the <paper> element also contains independent occurrences of the query keywords in the sub-elements <title> (line 6) and <abstract> (lines 9-10). Thus, the <paper> element is also returned as a result of the query. ...
... The algorithm works by merging the inverted lists by the Dewey ID (lines 6-9), and computing the longest common prefix of the current entry and the previous entry stored in the Dewey stack (lines [10][11]. It then pops all the Dewey stack components that are not part in the common prefix (lines [12][13][14][15][16][17][18][19][20][21][22][23][24] and if any of the popped components contain all the query keywords, they are added to the result heap (lines [15][16][17][18]. ...
... The RDIL algorithm thus determines the longest common prefix (lcp) of a Dewey ID that contains all the query keywords by repeatedly probing the B+-tree for each query keyword (lines [11][12][13][14][15][16]. Once the lcp is determined, its ranks and posLists are obtained using regular B+-tree range scans (line 19). ...
Article
Full-text available
We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.
... The first approach, namely the structure-mapping approach, create relational schemas based on the structure of XML documents (deduced from their DTD, if available). Basically, with this approach a relation is created for each element type in the XML documents, [2,1], and a database schema is defined for each XML document structure or DTD. This is the approach we used in the Xere algorithm. ...
... For both approaches, queries on XML documents are converted into database queries before processing. First, there are simple methods that basically design relational schemas corresponding to every element declaration in a DTD, [2,1]. Other approaches design relational schemas by analyzing DTDs more precisely. ...
Conference Paper
Full-text available
XML (eXtensible Markup Language) is becoming the stan- dard format for documents on Internet and is widely used to exchange data. Often, the relevant information contained in XML documents needs to be also stored in legacy databases (DB) in order to integrate the new data with the pre-existing ones. In this paper, we introduce a technique for the automatic XML-DB integration, which we call Xere. In particular we present, as the first step of Xere, the mapping algorithm which allows the translation of XML Schemas into Entity-Relationship diagrams.