RDF triples generated from an example sentence.

Source publication

Ontology-Based Controlled Natural Language Editor Using CFG with Lexical Dependency

Conference Paper

Full-text available

Jan 2007

In recent years, CNL (Controlled Natural Language) has received much attention with regard to ontology-based knowledge acquisition systems. CNLs, as subsets of natural languages, can be useful for both humans and computers by eliminating ambiguity of natural languages. Our previous work, OntoPath (10), proposed to edit natural language-like narrati...

Context 1

... as subsets of natural languages, have recently received much attention with regard to ontology-based knowledge acquisition systems, for its ability to eliminate ambiguity of expressions in natural languages. Several studies were devoted to the use of CNL in ontology-related data processing such as ontology construction, query generation, and data annotation [3][4]. A CNL-based guided look-ahead editor might help users select proper words that meet his intended but vague notions without proper knowledge on the sentence structures. The statements controlled by predefined grammars, usually defined in CFG which is a computational notation of natural language structures, can be translated into ontology-referenced data and queries with precision [2][5]. Our previous work, OntoPath, assists editing in such an intelligent way that it recognizes the resource type of a description and offers users context-sensitive actions to perform on that description. A domain-specific ontology plays a role in collecting language constituents, such as nouns and verbs, to be translated into RDF triples [9][10]. A lightweight look-ahead editor helps users, specifically medical experts, with guidance on choosing next words, using the approved grammars and semantic relations of entities from the ontology. Because most medical sentences have general recommended structures to ensure precise knowledge expression, CNL and a look- ahead guiding system can assume an important role in such an application. However, our previous work and other systems have difficulties in enlarging the expression capacity, expanding the grammar, specifying patternized sentences, and adapting informal expressions such as Korean sentences with English words. These difficulties are attributed to the fact that the grammar definition system like CFG does not include semantic structures, but sequential structures of a sentence. These limitations need to be solved to deal with various sentences so that users can exploit more familiar expressions, and are enforced into using the patternized sentences. A newly developed editor, which we propose in this paper, permits grammar definitions through CFG-LD that includes both sequential and semantic views on sentence structures. Using this grammar definition system, we can define grammars and the semantic structures of sentences to be used in our editor. The Grammar definitions include the structural descriptions of grammatical states to mention sequences of POS (Part-Of-Speech) with CFG. Designations of lexical dependency between sentence elements are also included. Using defined grammars, the implemented CNL editor enables us to get structure data from writer’s narratives with 1) more sophisticated expressions, 2) patternized expressions, and 3) informal expressions consisting of multi language constituents. We begin this paper with the description of related works on a CNL. An explanation of the representation of narratives using RDF triples is provided in section 2. The CFG-LD and its definition rules are discussed in section 3. In sections 4 and 5, we explain the architecture and implementation of the developed editor. Finally, we provide conclusions of this work in section 6. CNL was restricted subsets of natural languages on grammars and dictionaries to eliminate ambiguity and complexity of pure natural languages. Originally, the main purpose of controlled languages was to improve readability for human readers, particularly non-native speakers. An example is AECMA Simplified English that was created as a manual description language for aircraft maintenance guideline. Another advantage of CNL is to improve text processing capability of computers with removed complexity. Many studies have been done to develop systems that transform written sentences into formal logical expressions. Some well-known examples are as follows: ACE (Attempto Controlled English) [6], CLCE (Common Logic Controlled English) [7], and PENG-D [2]. For an automatic translation from a discourse representation structure into a variant of first-order logic, ACE is defined as a controlled natural language in Attempto project. The ACE based sentences are translated into the Semantic Web querying language PQL [8]. Other example of a controlled natural language is CLCE that has been developed by Sowa [9]. CLCE, as a formal language with an English-like syntax, is supplied with more expression than ACE in the sense that it supports ontology for sets, sequences, and integers, and also allows in-line declarations of words linked to relational databases. It supports the automated translation of written narratives to conceptual graph or other logical expressions [8]. PENG-D also proposed a computer-processable CNL to be translated into formal logical sentences decidable with an OWL (Web Ontology Language) language. In this subsection, we overview a translation from a sentence to RDF triples with a gross description narratives example of pathologic examination. The description language supported by OntoPath is compatible with a restricted form of RDF and RDF Schema. The system is designed to annotate the semantic metadata in RDF with the vocabularies that are already constrained by a given ontology in RDF Schema. The ontology then plays a role in guiding the generation of medical narratives as RDF documents. The narratives are validated with the syntactic and semantic rules of RDF Schema and are transformed into RDF documents. An RDF triple statement consists of a specific resource, which is an individual primitive semantic element with a named property and value for that resource. The basic RDF model represents the named properties and property values. A property is a rule that provides the meaning of the expressions, which is specifying the way the thing should be constituted. A built ontology such as a schema, which is a vocabulary description language, provides mechanisms for describing groups of related resources and the relationships between these resources. Instead of defining a class in terms of the properties its instances may have, the ontology describes properties in terms of the resource classes to which they apply. This is the role of the domain and range mechanisms. This example sentence can be translated as shown in the figure 1 when it is typed in the form of the predefined grammar through the guidance of the editor [9][10]. The instances about a real patient, ‘a specimen received’ and ‘a cyst’ are conceptualized as the instances of classes Tissue and Cyst , respectively. The properties, contains and measures , are also specified with their object values in the sentence. This translation can be definitely performed on the example sentence written in a predefined grammar. However, if the user describes the sentence with another manner, it can not be successfully translated, because the translation system will assume restricted grammars and translation processes. To expand the grammars for enlarging expression capacity, we can add more grammars using CFG, but it is still not enough for this translation work, since the translation can be different from the composed structure and semantic dependency among the sentence elements. In this chapter, we introduce CFG-LD, which is a grammar definition system for describing grammars with lexical dependences. As we have shown in the previous chapter, the translation between a simple English sentence and a RDF triple is possible through quite simple translation rule on the grammar. However, it is hard to deal with those sentences with different structures, and an annexed expression such as idioms (e.g., “there is something”) or patternized phrases appearing in the sentences. Other grammatical expressions following a different sequence of POS such as ‘subject-object-verb’ are also hardly handled through the original approach. Sequential and semantic structures of those sentences should be declared to enlarge the translation capacities. Resolving the various structures of sentences can be possible through the previously developed CNL systems listed in the previous chapter. Their built-in sentence resolutions mainly relied on English are restricted in the informal expressions consisting of multi-language constituents, and it is also hard to gather well-defined CNL grammars written in every desired language. Therefore, in our CNL based editor, we employ slightly modified grammar expressions named CFG- LD. It notifies a lexical parser for both grammars and lexical dependencies, to let the parser or system know sequential and semantic structures of the grammars where the ontology provides language constituents and domain and range relations of them. CFG is a famous computational notation used to express natural language structure, and to make development of applications that parse natural language sentences easily. Chomsky proposed the notion of CFG as a model for describing natural languages with following four quantities: Terminals, Non-terminals, Productions, and Start symbol [11]. The grammars described below in CFG express simple grammars to parse an example sentence, ‘Nam is a student supervised by a professor named Kim’ with a set of lexicons enabling aware of terminals’ ...

View in full-text

Finding the story: broader applicability of semantics and discourse for hypermedia generation

Article

Full-text available

Jul 2003

Generating hypermedia presentations requires processing constituent material into coherent, unified presentations. One large challenge is creating a generic process for producing hypermedia presentations from the semantics of potentially unfamiliar domains. The resulting presentations must both respect the underlying semantics and appear as coheren...

Expansion of a relational database to support semantic web queries

Article

Full-text available

Jun 2017

Building ontologies for different natural languages

Article

Full-text available

Jun 2014
Comput Sci Inform Syst

Ontology construction of a certain domain is an important step in applying the Semantic web. A number of software tools adapted for building domain ontologies of most wide spread natural languages are available, but accomplishing that for any given natural language presents a challenge. Here we propose a semi-automatic procedure to create ontologies for different natural languages. Our approach utilizes various software tools available on the Internet most notably DODDLE-OWL - a domain ontology development tool implemented for English and Japanese languages. By using this tool, Word Net, Prot g and XSLT transformations, we propose a general procedure to construct domain ontology for any natural language.

A Survey and Classification of Controlled Natural Languages

Article

Full-text available

Mar 2014
COMPUT LINGUIST

Tobias Kuhn

What is here called controlled natural language (CNL) has traditionally been given many different names. Especially during the last four decades, a wide variety of such languages have been designed. They are applied to improve communication among humans, to improve translation, or to provide natural and intuitive representations for formal notations. Despite the apparent differences, it seems sensible to put all these languages under the same umbrella. To bring order to the variety of languages, a general classification scheme is presented here. A comprehensive survey of existing English-based CNLs is given, listing and describing 100 languages from 1930 until today. Classification of these languages reveals that they form a single scattered cloud filling the conceptual space between natural languages such as English on the one end and formal languages such as propositional logic on the other. The goal of this article is to provide a common terminology and a common model for CNL, to contribute to the understanding of their general nature, to provide a starting point for researchers interested in the area, and to help developers to make design decisions.

Translating Natural Language Competency Questionsinto SPARQL Queries: a Case Study

Conference Paper

Full-text available

Jan 2013

OWLPath: an OWL ontology-guided query editor

Article

Full-text available

Feb 2011
IEEE T SYST MAN CY A

Most Semantic Web technology-based applications need users to have a deep background on the formal underpinnings of ontology languages and some basic skills in these technologies. Generally, only experts in the field meet these requirements. In this paper, we present OWLPath, a natural language-query editor guided by multilanguage OWL-formatted ontologies. This application allows nonexpert users to easily create SPARQL queries that can be issued over most existing ontology storage systems. Our approach is a fully fledged solution backed with a proof-of-concept implementation and the empirical results of two challenging use cases: one in the domain of e-finance and the other in e-tourism.

Controlled English for Knowledge Representation

Article

Full-text available

Nov 2010

Tobias Kuhn

Knowledge representation is a long-standing research area of computer science that aims at representing human knowledge in a form that computers can interpret. Most knowledge representation approaches, however, have suffered from poor user interfaces. It turns out to be difficult for users to learn and use the logic-based languages in which the knowledge has to be encoded. A new approach to design more intuitive but still reliable user interfaces for knowledge representation systems is the use of controlled natural language (CNL). CNLs are subsets of natural languages that are restricted in a way that allows their automatic translation into formal logic. A number of CNLs have been developed but the resulting tools are mostly just prototypes so far. Furthermore, nobody has yet been able to provide strong evidence that CNLs are indeed easier to understand than other logic-based languages. The goal of this thesis is to give the research area of CNLs for knowledge representation a shift in perspective: from the present explorative and proof-of-concept-based approaches to a more engineering-focussed point of view. For this reason, I introduce theoretical and practical building blocks for the design and application of controlled English for the purpose of knowledge representation. I first show how CNLs can be defined in an adequate and simple way by the introduction of a novel grammar notation and I describe efficient algorithms to process such grammars. I then demonstrate how these theoretical concepts can be implemented and how CNLs can be embedded in knowledge representation tools so that they provide intuitive and powerful user interfaces that are accessible even to untrained users. Finally, I discuss how the understandability of CNLs can be evaluated. I argue that the understandability of CNLs cannot be assessed reliably with existing approaches, and for this reason I introduce a novel testing framework. Experiments based on this framework show that CNLs are not only easier to understand than comparable languages but also need less time to be learned and are preferred by users.

Enabling Intelligent Service Discovery with GGODO

Article

Full-text available

Jun 2010
J INF SCI ENG

The Web has changed from a mere repository of information to a new platform for business transactions where organizations deploy, share and expose business processes via Web services. New promising application fields such as the Semantic Web and Semantic Web Services are leveraging the potential of deploying those services, but face the problem of discovering and invoking them in a simple way for common users. GGODO is an experimental solution that combines natural language analysis and semantically-empowered techniques to let users express their goals in a guided way, which produces better results than previous non guided tools.

Enabling Intelligent Service Discovery with GGODO

Article

Full-text available

Jan 2010
J INF SCI ENG

The Web has changed from a mere repository of information to a new platform for business transactions where organizations deploy, share and expose business processes via Web services. New promising application fields such as the Semantic Web and Semantic Web Services are leveraging the potential of deploying those services, but face the problem of discovering and invoking them in a simple way for common users. GGODO is an experimental solution that combines natural language analysis and semantically empowered techniques to let users express their goals in a guided way, which produces better results than previous non guided tools.

A Semantic Query Interface for the OGO Platform

Conference Paper

Full-text available

Jan 2010

In the last years, a number of semantic biomedical systems have been developed to store biomedical knowledge in an accessible manner. However, their practical usage is limited, since they require expertise in semantic languages by the user, or, in the other hand, their query interfaces do not fully exploit the semantics of the knowledge represented. Such drawbacks were present in the OGO system, a resource that semantically integrates knowledge about orthologs and human genetic diseases, developed by our research group. In this paper, we present an extension of the OGO system for improving the process of designing advanced semantic queries. The query module requires the users to know and to manage only the OGO ontology, which represents the domain knowledge, simplifying the process of query building.

On Designing Controlled Natural Languages for Semantic Annotation

Conference Paper

Full-text available

Jun 2009

Manual semantic annotation is a complex and arduous task both time-consuming and costly often requiring specialist annotators. (Semi)-automatic annotation tools attempt to ease this process by detecting instances of classes within text and relationships between instances, however their usage often requires knowledge of Natural Language Processing(NLP) or formal ontological descriptions. This challenges researchers to develop user-friendly annotation environments within the knowledge acquisition process. Controlled Natural Languages (CNL)s offer an incentive to the novice user to annotate, while simultaneously authoring, his/her respective documents in a user-friendly manner, yet shielding him/her from the underlying complex knowledge representation formalisms. CNLs have already been successfully applied within the context of ontology authoring, yet very little research has focused on CNLs for semantic annotation. We describe the design and implementation of two approaches to user friendly semantic annotation, based on Controlled Language for Information Extraction tools, which permit non-expert users to semi-automatically both author and annotate meeting minutes and status reports using controlled natural language.

RDF triples generated from an example sentence.

Context in source publication

Similar publications

Citations