Modified XML data model

Source publication

An XML-based data model for flexible representation and query of linguistically interpreted corpora

Article

Full-text available

Jan 2007

We present an XML-based data model that is deployed in a system for querying corpora with multiple layers of linguistic annotation. The model is based upon the simple, but effective idea of leaving each layer of annotation intact at annotation time and only relate the layers to each other at query time. Queries select parts of the layers or of the...

An Extensible Rule Transformation Model for XQuery Optimization

Article

Full-text available

Jan 2007

Nicolas Travers

,Efficient evaluation of XML Query Languages has become,a crucial issue for XML exchanges and integration.

An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing

Article

Full-text available

May 2004

This paper provides an objective evaluation of the performance impacts of binary XML encodings, using a fast stream-based XQuery processor as our representative application. Instead of proposing one binary format and comparing it against standard XML parsers, we investigate the individual effects of several binary encoding techniques that are share...

An XQuery program and its logical query tree. a XQuery program b...

Contribution rate of different type of parallelism

Comparison of parallel execution time on different platforms. a On...

Comparison of speedup. a W3C cases b XMark cases

Automatic parallelization of XQuery programs on multi-core systems

Article

Full-text available

Apr 2016

The popularity of multi-core systems makes software parallelization become an important way to improve performance. As a mainstream XML query language, XQuery is the core of XML processing. It is critical to take full advantage of multi-core computing to improve XML processing performance through parallelization of XQuery. However, usually it is di...

On the Maintenance of Materialized XML Views

Article

Full-text available

Jan 2005

Providing services by integrating information available in web resources is one of the main goals of a mediation architecture. In this paper, we consider the standard wrapper-mediator architecture under the following hypothesis: (i) the information exchanged between wrap- pers and the mediator consists in XML documents, (ii) wrappers have limited r...

An XQuery cost model in relative form

Article

Full-text available

Dec 2005

We propose an XQuery cost model that is able to estimate the performance gain of source-level transformation. The cost of ma-jor language constructs, including FLWOR, quantified, path, element construction, and predicate expressions are captured. The evaluation of optimization using existing real engines suffer from problems, such as lack of applic...

et%20al.%20[2012],%20A%20Framework%20for%20Retrieval%20and%20Annotation%20in%20Digital%20Humanities%20using%20XQuery%20Full%20Text%20and%20Update%20in%20BaseX[1]

Data

Full-text available

Sep 2012

Choosing an XML database for linguistically annotated corpora

Article

Full-text available

Sep 2008

Richard Eckart

XML has become the de-facto standard for representing linguistically annotated cor- pora. It seems safe to assume that storing and querying an XML-encoded, annotated corpus in an XML database is a straightforward procedure. In reality, however, it is not. This article aims to provide guidelines for deciding whether to use an XML database and how to choose a suitable product. To this end we examine the following questions: Which aspects should be considered before choosing to store an XML-encoded annot- ated corpus in an XML database? Which facilities does a database need to provide in or- der to be suitable for storing and querying annotated corpora? Do current XML data- bases offer these facilities, and, if not, can they be added?

Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers.

Conference Paper

Full-text available

Jan 2008

We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-basedquerying of multiplecorpora simultaneously.

An OWL-and XQuery-based mechanism for the retrieval of linguistic patterns from XML-corpora

Article

Full-text available

Jan 2007

We present an approach for querying collections of heterogeneous linguistic corpora that are an-notated on multiple layers using arbitrary XML-based markup languages. An OWL ontology is used to homogenise the conceptually different markup languages so that a common querying framework can be established.

A Web-Platform for Preserving, Exploring, Visualising, and Querying Linguistic Corpora and other Resources Plataforma web para el mantenimiento, exploracion, visulizacion y busqueda en corpus lingu´isticos y en otros recursos

Article

Full-text available

We present SPLICR, the Web-based Sustainability Platform for Linguis- tic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one's spe- cific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collec- tions of language resources.

A LAF/GrAF based Encoding Scheme for underspecified Representations of syntactic Annotations.

Conference Paper

Jan 2008

Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified repre- sentations are particularly well suited for the representa tion of ambiguous data because they allow for high informational efficiency. We discuss the issue of being informationally efficient, and th e trade-off between efficient encoding of linguistic annota tions and complete documentation of linguistic analyses. The main topic of this article is a data model and an encoding scheme based on LAF/GrAF (Ide and Romary, 2006; Ide and Suderman, 2007) which provides a fle xible framework for encoding underspecified representatio ns. We show how a set of dependency structures and a set of TiGer graphs (Brants et al., 2002) representing the readings of an ambiguou s sentence can be encoded, and we discuss basic issues in querying corpora which are encoded using the framework presented here.

An NLP Lexicon as a Largely Language-Independent Resource

Article

Full-text available

Jun 2005
Mach Translat

This paper describes salient aspects of the OntoSem lexicon of English, a lexicon whose semantic descriptions can either be grounded in a language-independent ontology, rely on extra-ontological expressive means, or exploit a combination of the two. The variety of descriptive means, as well as the conceptual complexity of semantic description to begin with, necessitates that OntoSem lexicons be compiled primarily manually. However, once a semantic description is created for a lexeme in one language, it can be reused in others, often with little or no modification. Said differently, the challenge in building a semantic lexicon is describing semantics; once the semantics are described, it is relatively straightforward to connect given meanings to the appropriate head words in other languages. In this paper we provide a brief overview of the OntoSem lexicon and processing environment, orient our approach to lexical semantics among others in the field, and describe in more detail what we mean by the largely language-independent lexicon. Finally, we suggest reasons why our resources might be of interest to the larger community.

Modified XML data model

Similar publications

Citations