Conference PaperPDF Available

XSLT Version 2.0 Is Turing-Complete: A Purely Transformation Based Proof

Authors:

Abstract and Figures

XSLT is a programming language, originally designed to convert XML documents to XHTML for presentation on browsers. XSLT works by matching predefined patterns to a source XML document and producing output for each kind of construct that is matched. In spite of its relatively humble goals, XSLT has the full power of a Turing machine, i.e. it is “Turing-complete.” We show this is so by implementing an interpreter for a generic Turing machine in XSLT version 2.0. We use only the constructs available in the official specification of XSLT version 2.0 by the World Wide Web consortium, and no extensions to the core specification. Furthermore, we do not resort to string functions (which are also available in XSLT) but rather rely on the innate transformational capabilities of XSLT.
Content may be subject to copyright.
A preview of the PDF is not available
... xml have gain much interest the last decade and the aforementioned tasks have been deeply studied [Tatarinov, Viglas, Beyer, Shanmugasundaram, Shekita, and Zhang, 2002;Vansummeren, 2005;Koch, 2006;Onder and Bayram, 2006;Janssen, Korlyukov, and Van den Bussche, 2007;Michiels, 2007;Schmidt, Scherzinger, and Koch, 2007;Debarbieux, Gauwin, Niehren, Sebastian, and Zergaoui, 2013]. Usually these tasks are formally studied using unranked data trees as models for xml documents and characterising all schema, query and transformation languages by means of tree automata or FO and MSO logics over tree structures [Wei, Li, Rundensteiner, and Mani, 2006;Benedikt, Libkin, and Neven, 2007;Onizuka, 2010;Gauwin and Niehren, 2011]. ...
Article
Inspired by property testing, our objective is to obtain sublinear algorithms for deciding properties of XML databases approximatively. More precisely, we investigate the properties of whether an unranked tree is valid for a DTD, or more generally, whether it is recognized by a tree automaton.We start our studies by the simpler case of words and we considered the approximate membership problem for word non-deterministic automata. For this problem, we provide an efficient tester that runs in polynomial time in the size of the input automata and the error precision. We also improve the previous [Alon, Krivelevich, Newman, and Szegedy, 2000b] approximate membership tester for regular languages modulo the Hamming distance, so that it runs in polynomial time in the size of the input automata.Secondly, we study approximate membership testing for tree automata modulo the standard edit distance, and obtain a tester with run time exponential in the input tree depth. Next we consider approximate DTD validity modulo the strong edit distance. We then provide a tester that depends polynomially on the height of the tree. Finally, modulo the strong edit distance, we prove a linear lower bound on the depth of the input tree.
... This said, nothing prevents a markup language from having prepared commands invoked upon declaration of some defined tags or properties. The XML compliant language XSLT for instance is a Turing-complete language [45]. More on declarative languages in the next section. ...
... Over the last decade, the Xml language has become the de facto standard for data exchange for Web applications. Xml transformation languages, like Xslt and XQuery, are essential for interoperability of Xml based applications and their study has become one of the central topic in theoretical and practical research in Xml [1,9,21,10,2]. Typical uses of Xml transformations include converting an Xml document to an Xhtml web page, another Xml document, or to an unstructured format e.g. ...
Article
Full-text available
We present a Myhill-Nerode theorem for the class of linear deterministic top-down tree-to-word transducers (ltops), which shows that every ltop is equivalent to a unique earliest ltop (eltop) with a minimal number of states. The proof of this theorem is based on a new semantic characterization of transformations definable by ltops that we present. We also show that minimization of eltops is in Ptime as opposed to NP-hardness for general ltops. Our results imply that equivalence testing is in Ptime for eltops and in Exptime for general ltops.
... finite-state concurrent systems such as network and security protocols.2 Albeit with notable exceptions, such as XSLT[11,12], HTML5+CSS3 (shown to be undecidable by virtue of its ability to implement Rule 110[13,14]), and PDF (for many reasons, including its ability to embed Javascript[15]). ...
Article
We present a formal language theory approach to improving the security aspects of protocol design and message-based interactions in complex composed systems. We argue that these aspects are responsible for a large share of modern computing systems' insecurity. We show how our approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols, representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. We also discuss the negative impact unnecessarily increased protocol complexity has on security. This paper provides a foundation for designing verifiable critical implementation components with considerably less burden to developers than is offered by the current state of the art. In addition, it offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools, and techniques.
... Since xslt programs are Turing complete [18,26], polynomial exact learning can only be done for subclasses. The tree translation core of xslt can conveniently be modeled by tree transducers [2,17,19,21]. ...
Article
Full-text available
A generalization from string to trees and from languages to translations is given of the classical result that any regular language can be learned from examples: it is shown that for any deterministic top-down tree transformation there exists a sample set of polynomial size (with respect to the minimal transducer) which allows to infer the translation. Until now, only for string transducers and for simple relabeling tree transducers, similar results had been known. Learning of deterministic top-down tree transducers (DTOPs) is far more involved because a DTOP can copy, delete, and permute its input subtrees. Thus, complex dependencies of labeled input to output paths need to be maintained by the algorithm. First, a Myhill-Nerode theorem is presented for DTOPs, which is interesting on its own. This theorem is then used to construct a learning algorithm for DTOPs. Finally, it is shown how our result can be applied to XML transformations (e.g. XSLT programs). For this, a new DTD-based encoding of unranked trees by ranked ones is presented. Over such encodings, DTOPs can realize many practically interesting XML transformations which cannot be realized on first-child/next-sibling encodings. A preliminary extended version can be found at http://www.grappa.univ-lille3.fr/~niehren/Papers/learn-dtop/long.pdf .
Article
Software-as-a-Service (SaaS) is typically defined as a rental model for using a complex software product, running on a centralized computing platform, using a thin client (most frequently a web browser). As such, it is one of the major categories of Cloud Computing, besides IaaS and PaaS. While there are many economic benefits in using SaaS, each company must nevertheless enforce control over its own data processed in the Cloud. One of the most important building blocks of such an enforcement scheme is idM, whereat the industry standard for idM is SAML, the Security Assertion Markup Language. In this paper, we study the security of the SAML implementations of 22 CPs and show that 90% of them can be broken, resulting in company data exposure to attackers on the Internet. The detected vulnerabilities are exploited by a wide variety of attack techniques, ranging from classical web attacks to problems specific to XML processing.
Article
Recently eXtensible Markup Language (XML) has achieved the leading role among languages for data representation and, thus, we can witness a massive boom of corresponding techniques for managing XML data. Most of the processing techniques, however, suffer from various bottlenecks worsening their time and/or space efficiency. We assume that the main reason is they consider XML collections too globally, involving all their possible features, although real-world data are often much simpler. Even though some techniques do restrict the input data, the restrictions are mostly unnatural. This paper aims to introduce Analyzer—a complex framework for performing statistical analyses of real-world documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports.
Conference Paper
The ability to convert between different data formats is important in large and heterogeneous information systems. Although XML was established as an universal standard for data exchange, XML-related languages like XQuery lack the ability to access data in other formats; in particular, relational data and RDF. In this paper, we describe TriQuery - an extension of the XQuery language which adds records (tuples) and RDF-specific operators. Using the statically optimizable record types, relational data as well as the results from RDF sub-queries can be integrated more efficiently than with their traditional encoding using XML elements and attributes.
Conference Paper
Full-text available
We use XSLT to implement an interpreter for a simple XML based imperative programming language called “XIM.” Our work shows that not only is it theoretically possible to use XSLT as a programming language processor, but also that this is practically feasible. This has potential application in the area of delivering executable content over the Internet.
Conference Paper
The World Wide Web Consortium recommends both XSLT and XQuery as query languages for XML documents. XSLT, originally designed to transform XML into XSL-FO, is nowadays a fully grown XML query language that is mostly suited for use by machines. XQuery on the other hand was particularly designed to be easily used by humans. Both languages are known to be Turing-complete. We provide here a very simple proof of Turing-completeness of XSLT and XQuery by coding -recursive functions thereby showing that Turing-completeness is a consequence of a few basic and fundamental features of both languages.
Article
We describe an emerging field, that of nonclassical computability and nonclassical computing machinery. According to the nonclassicist, the set of well-defined computations is not exhausted by the computations that can be carried out by a Turing machine. We provide an overview of the field and a philosophical defence of its foundations.