ArticlePDF Available

Answering engineers' questions using semantic annotations

April 2007
Artificial Intelligence for Engineering Design Analysis and Manufacturing 21(2):155-171

April 2007
21(2):155-171

DOI:10.1017/S0890060407070205

Source
DBLP

Authors:

Sanghee Kim

Rob Bracewell

University of Cambridge

Question–answering (QA) systems have proven to be helpful, especially to those who feel uncomfortable entering keywords, sometimes extended with search symbols such as +, *, and so forth. In developing such systems, the main focus has been on the enhanced retrieval performance of searches, and recent trends in QA systems center on the extraction of exact answers. However, when their usability was evaluated, some users indicated that they found it difficult to accept the answers because of the absence of supporting context and rationale. Current approaches to address this problem include providing answers with linking paragraphs or with summarizing extensions. Both methods are believed to be sufficient to answer questions seeking the names of objects or quantities that have only a single answer. However, neither method addresses the situation when an answer requires the comparison and integration of information appearing in multiple documents or in several places in a single document. This paper argues that coherent answer generation is crucial for such questions, and that the key to this coherence is to analyze texts to a level beyond sentence annotations. To demonstrate this idea, a prototype has been developed based on rhetorical structure theory, and a preliminary evaluation has been carried out. The evaluation indicates that users prefer to see the extended answers that can be generated using such semantic annotations, provided that additional context and rationale information are made available.

Screenshot of RST analysis using RSTTool (O'Donnell, 2000)

…

Steps of the indexing process

…

Overall architecture of the proposed QA system

…

An example screenshot of the proposed system

…

Figures - uploaded by Rob Bracewell

Content may be subject to copyright.

Content uploaded by Rob Bracewell

Content may be subject to copyright.

Answering Engineers’ Questions Using Semantic Annotations

Paper Number: 06

Question-Answering (QA) systems have proven to be helpful especially to those who feel

uncomfortable entering keywords, sometimes extended with search symbols such as +, *, etc. In

developing such systems, the main focus has been on the enhanced retrieval performance of search es,

and recent trends in QA systems centre on the extraction of exact answers. However, when their

usability was evaluated, some users indicated that they found it difficult to accept the answers due to

the absence of supporting context and rationale. Current approaches to address this problem include

providing answers with linking paragraphs or with summarising extensions. Both methods are

believed to be sufficient to answer questions seeking the names of objects or quantities that have only

a single answer. However, neither method addresses the situation when an answer requires the

comparison and integration of information appearing in multiple documents or in several places in a

single document. This paper argues that coherent answer generation is crucial for such questions and

that the key to this coherence is to analyse texts to a level beyond sentence annotations. To

demonstrate this idea, a prototype has been developed based on Rhetorical Structure Theory and a

preliminary evaluation has been carried out. The evaluation indicates that users prefer to see the

extended answers that can be generated using such semantic annotations, provided that additional

context and rationale information are made available.

Keywords: Information retrieval, question-answering, semantic annotations, natural language

processing, Rhetorical Structure Theory

1 Introduction

Electronic documents are one of the most common information sources in organisations and

approximately 90% of organisational memory exists in the form of text-based documents. It has been

reported that 35% of users find it difficult to access information contained in these documents and at

least 60% of the information that is critical to these organisations is not accessible using typical search

tools (80-20 software, 2003). There are two main problems. The first is that there is simply too much

information to be searched. The second is that differences exist between the indexing approaches used

in search engines and the way people perceive and access the contents of documents. This means that

most users find searching for relevant information difficult since it is not possible for them to enter

keywords in a sufficiently precise form for them to be used effectively by current search engines.

Current retrieval systems accept queries from users in the form of a few keywords and retrieve a long

list of matching documents. Users then have to sift through the documents to locate the information

they are looking for. For simple fact-based queries, e.g. What material should be used for this turbine

blade? most users can enter satisfactory keywords that rapidly find the required answers. However,

keyword-based systems cannot cope with questions involving: (1) comparing, e.g. What are the

advantages and disadvantages of using aluminium compared with steel for this saucepan? (2)

reasoning, e.g. How safe are commercial fights? and (3) extracting answers from different documents

and fusing them into a complete answer. In order to obtain useful answers to these types of question,

users currently have to expend considerable time and effort.

In previous research in this domain, automatic query expansion and taxonomy-based searches have

been proposed. Query expansion improves on keyword searching for short questions that require only

a few documents to be located to provide the answers. Taxonomy-based searches require the

hierarchical organisation of domain concepts. This relieves the user of having to enter accurate

keywords as information is searchable by selecting concepts. However, considerable effort is required

to create the classifications and maintain the hierarchy. For example, Yahoo 1 employs around 50

subject experts to maintain their directories and indexes. Not many organisations can afford to adopt

such a strategy and current automatic classifiers only achieve 60-80 % accuracy (Mukherjee & Mao,

2004). This means that in automatic classification around a third of the documents will be missed or

misclassified. Manual classifications are subjective, and often based on the few sentences which are

deemed important for individual indexers. Clearly when information is sought that does not match the

indexing, such a taxonomy-based approach does not help.

Offering users the facility to enter their queries in natural language might greatly enhance current

search engine interfaces and be particularly helpful for less experienced users who are not adept at

advanced keyword searches. Recent research into natural language-based retrieval systems has mainly

pursued a Question-Answering (QA) approach. QA systems have successfully retrieved short answers

to natural language questions instead of directing users to a number of documents that might contain

the answers. Typically, QA uses a combination of Information Retrieval (IR) and Natural Language

Processing (NLP) techniques. IR techniques are used to pinpoint a subset of documents and to locate

parts of those documents that are related to the questions. NLP techniques are used for extracting brief

answers. There is great interest in developing robust and reliable QA systems to exploit the enormous

quantity of information available on-line in order to answer simple questions such as: Who is the

president of USA?. This is relatively easy since straightforward NLP techniques, such as pattern-

matching, are sufficient to answer it. The numerous occurrences and multiple reformulations of the

same information available on the Web greatly increase the chance of finding answers that are

syntactically similar to the question (Brill et al., 2001). On the Intranets run by organisations, the

quantity of information, although large, is much less than on the Web and the number occurrences and

reformulations less exhaustive.

1 http://www.yahoo.com

Unlike users searching on the Web, those in organisations are likely to ask questions that are not easily

answered by simply looking up syntactic similarities in databases. Such answers can be considered

complex and may need to be inferred from different parts of a single text or from multiple texts. The

initial question posed by a user may be ill-formed, i.e. too broad or too specific, making it difficult for

the retrieval system to interpret and hence further interaction with the user is often necessary.

Answering such complex questions has received little attention within QA research. To answer such

questions, the issues of correctly interpreting the question and presenting the answer must both be

addressed. When presenting answers to complex questions, it is not sufficient just to present the

answer, i.e. the user needs additional supporting information with which to assess the trustworthiness

of the answer. For example, hemlock poisoning and drinking hemlock can both be considered correct

answers to the question: How did Socrates die? (Burger et al., 2001). Users who have some

background knowledge about poisoning might appreciate a brief answer, as they do not want to read

through a long text to extract the answer themselves. Other users with less background knowledge

might prefer to see where the answers came from and want to read more text explaining the answers

before accepting them. Searching precisely for How did Socrates die? on the Web using Google

produces around 748,000 results. It is clear that with a question such as this, answers with varying

formulations were mentioned in numerous documents or in many parts of a single document. Answers

appearing as multiple instances need to be fused efficiently in order to reduce repeated information.

Apart from what is simply stated in the question, the user’s real intention might have been to know the

reason why Socrates chose to die by poisoning. For questions that are ill-formed, it is important that

answers are extended with related information that increases a user’s understanding of the answers. It

is therefore necessary to research suitable ways of presenting answers in a clear and coherent manner,

and providing sufficient supporting information to allow users to decide whether or not to trust the

answers.

A combination of two approaches, both using semantic relations, is therefore proposed for presenting

clear and coherent answers. First, duplicate information is removed. Second, answers are synthesised

from multiple occurrences, and then justified by adding supporting information. In order to achieve

this, semantic analysis is necessary of both the questions and the texts from which the answers are to

be extracted. Figure 1 shows an example of how these ideas might be implemented2. The initial

question posed by the user, i.e. What triggered the engine fire alarm in Boeing 727-217?, was aimed

at understanding the cause or causes of the fire alarm going off. The question itself is ambiguous

since it does not specify the engine nor the date of the flight. Assuming that the system correctly

understands the question, it can return the failure of the number 2 engine starter as the cause.

However, for complex incidents such this case, it is difficult to pinpoint particular causes and regard

them as independent of the remaining information. That is, there might be more than one cause for a

single incident, and some causes may depend on other causes. For example, in the example above

there are other contributing causes, e.g. the start valve had re-opened because of a short circuit or the

engine starter had failed due to over-speeding. There could also be consequent effects, e.g. residual

smoke and fire damage to the structure surrounding the number 2 engine. For presenting answers like

this, it is necessary to consider the actual information needs of the user from a knowledge-level. For

example, when faced with an unexpected observation (problem), engineers first assess whether or not

the problem is serious and requires diagnosis. Diagnosis normally proceeds by finding reasons or

causes that impact on the observation. Once the causes are identified, then it is likely that solutions

are required to prevent recurrences. The impact of making various hypothetical changes is likely to be

assessed, along with the advantages and disadvantages of the various solutions proposed. This

example demonstrates that the information needs for users in specific organisations are complex,

requiring not only sophisticated retrieval processing but also the presentation of retrieval results in as

natural a form as possible. Successful synthesis and presentation of such answers depend on the

ability to compare information on a semantic-level such that it produces a chain of semantic relations.

2 An example text is from http://www.tsb.gc.ca/en/reports/air/1996/a96o0125/a96o0125.asp

Figure 1. An example of generating a coherent and justified answer

A prototype of semantic-based QA system implementing these ideas has been developed. The

underlying approach is based on identifying various discourse relationships between two spans, such

as cause-effect and elaboration. These types of relationship are derived from a computational

linguistic theory known as Rhetorical Structure Theory (RST). This theory defines a set of rhetorical

relations and uses them to describe how the sentences are combined to form a coherent text (Mann &

Thomson, 1988). As such, RST analysis discovers relationships within a sentence or among

sentences. Since sentences are not usually comprehensible when isolated, this approach provides a

more sophisticated content analysis. These annotations are then used to remove duplicate information

and synthesise answers from multiple occurrences. Finally these answers are justified by adding

supporting information. As information is compared at the semantic-level rather than at the string

level, it is possible to determine whether a causal link exists between two events. This paper mainly

addresses questions related to causal inference and describes a prototype system to test the ideas. The

proposed system is targeted at the engineering area, however the methodology is generic and can be

applied to other domains.

2 Literature Review

Users find QA systems helpful as they do not need to go through each retrieved document to extract

the information they need. Until recently, most QA systems only function ed on specifically created

collections and on limited types of questions, but some attempts have been made to scale systems to

open domains like the Web (Kwok et al., 2001). Experiments show that in comparison to search

engines, e.g. Google, QA systems significantly reduce the effort to obtain answers. AskJeeves3 and

3 www.ask.com

Question: What triggered the engine fire alarm in Boeing 727-217?

The failure of the number 2 engine starter caused a fire as the investigation reveals that

residual smoke and fire damage to the structure surrounding the number 2 engine.

Extended answer:

The engine start valve master switch did not protect the complete circuit, → {causing a short

circuit in the engine wiring harness, a new voltage must have been subsequently available},

→ allowed the number 2 engine start valve to re-open, → {causing the number 2 engine

starter to over speed, because it was being rotated by the air turbine with no load on the

starter}, → causing the failure of the starter as evidenced by a two- by three-inch hole in the

side of the starter gear case, and the air turbine had come out through the retaining screen.

the retaining screen.

Brainboost4 are the examples of Internet search engines with a QA interface, but neither provides

fully-fledged QA capabilities. AskJeeves relies on hand-crafted question templates that enable

automatic answer searches, and returns lists of documents instead of intelligently extracting brief

answers. Brainboost supplies answers in plain English, but the correctness of its answers is limited to

specific questions only, and for many questions neither relevant texts nor exact answers are found.

Currently, developments in QA have focused on improving system performance through more

advanced algorithms for extracting exact answers (Voorhess, 2002) A project organised by the US

National Institute of Standards and Technology (NIST) has established benchmarks for evaluating QA

systems. Two new QA system response requirements were introduced in 2002: (1) to return an exact

answer; and (2) to return only one answer. Previous requirements had allowed systems to return five

candidate answers, and the answers could be between 50 and 250 bytes in length. This demonstrates

that current QA systems are focusing on retrieving exact answers to factual questions. For these

systems, performances of over 80% correct answers have been reported. However, user evaluations

consistently highlight the fact that the usability is hindered by the absence of context information that

would allow them to evaluate the trustworthiness of an answer. For example, user studies conducted

by Lin et al. (2003) suggest that users prefer to see the answer in a paragraph rather than as an exact

answer, even for a simple question like: Who was the first man on the Moon?

In comparison to open-domain QA systems, e.g. Web, domain-specific QA systems have the

following additional characteristics (Diekema, et al., 2004; Hickle et al, 2004; Nyberg et al., 2005):

limited amount of data is available in most cases

domain-specific terminologies have to be dealt with

user questions are complex.

4 www.brainboost.com

Shallow text processing methods are mostly used for QA systems on the Web due to Web redundancy,

which means that similar information is stated in a variety of ways and repeated in different locations.

However, in the engineering domain, suitable data can be scarce and answers to some questions might

only be found in a few documents and these may exhibit linguistic variations from the questions.

Therefore, intensive NLP techniques that can analyse unstructured texts using semantics and domain

models are more appropriate. Domain ontologies and thesauri are required to define domain-specific

terminologies. Hai and Kosseim (2004) used information in a manually created thesaurus to rank

candidate answers by annotating the special terms occurring both in the queries and candidate

answers. They also used a concept hierarchy for measuring similarities between a document and a

query. Ontologies have also been used for expanding terms in the questions and clarifying ambiguous

terms (Nyberg et al., 2004). Since ontologies can be regarded as storing information as triples e.g.

person – work-for – organisation, users can submit questions linked to such classes and relations in

natural language (Lopez et al., 2005). For example, the question: Is John an employee of IBM? can be

answered by recognising: (1) John is a person; (2) IBM is an organisation; and (3) employee is

inferred from ‘someone who works for an organisation’. Questions other than factual ones need

special attention and a profile of the user can help to improve system performance (Diekema et al.

2004). Suitable ways of presenting answers and how much information should be provided must also

be determined. To address these problems, some researchers proposed interactive QA. To that end,

some QA systems rephrase the questions submitted to confirm whether or not users’ information needs

have been correctly identified (Lin et al., 2003). Advanced dialog implementations have also been

suggested. However, Hickl et al. (2004) argue that the decomposition of user questions into simpler

ones with which answer types are associated could be a more practical solution than a dialog

interaction.

Generally, semantic annotations are treated as a similar task to named-entity recognition that identifies

domain concepts and their associations in a single sentence ( Aunimo & Kuuskoski, 2005). This paper

extends the notion of semantic annotation to include discourse relations that identify what information

is generated from the extended sequences of sentences. This goes beyond the meanings of individual

sentences by using their context to explain how the meaning conveyed by one sentence relates to the

meaning conveyed by another. A discourse model is essential for constructing computer systems

capable of interpreting and generating natural language texts. Such models have been used: to assess

student essays with respect to their writing skills; to summarise scientific papers; to extend the

answers to a user’ question with important sentences; and to generate personalised texts customised

for individual reading abilities (Bosma, 2005; Burstein et al., 2003; Teufel, 2001; Williams & Reiter,

2003).

3 Engineering Taxonomy

Retrieval systems in engineering need to employ domain-specific terminologies that differentiate

between specific and general terms. Specific terms are essential to understand users’ questions and

characterise documents in relation to those questions. Some general terms have specific meanings in

engineering. For example the term shoulder has multiple meanings in a dictionary, and in most cases,

it means the part of the body between the neck and the upper arm. However, in engineering, it can

refer to a locating upstand on a shaft. Domain taxonomies arrange such terms into a hierarchy. An

example of an engineering taxonomy is the Engineering Design Integrated Taxonomy (EDIT) and this

taxonomy is used throughout this paper. It consists of four root concepts (Ahmed, 2005):

The design process, i.e. a description of the different tasks undertaken at each stage of product

development, e.g. conceptual design, detail design, brainstorming.

The physical product to be produced, e.g. assemblies, sub-assemblies and components, using

part-of relations. For example, a motor and shaft of a motor.

The functions that must be fulfilled by the particular component or assembly. For example,

one of the functions of a compressor disc is to secure the compressor blade and one of the

functions of a cup is to contain liquid.

The issues, namely the considerations, that a designer must take into account when carrying

out a design process, e.g. considering the unit costs or production processes.

A detailed description of the development of a generic methodology to develop engineering design

taxonomies that was used for EDIT can be found in (Ahmed et al., 2005).

4 The Proposed Method

In general, a document can be encoded with various semantics, e.g. customer reviews or causal

accounts of engineering failures, and accessed by users who have very different interests. For

example in the case of product reviews by customers, negative and positive customer opinions are the

main messages for market researchers. On the other hand, designers are more interested in design-

related issues, comments and problems associated with engineering failures. It would therefore be

beneficial to include those semantics that facilitate searching for information in a way that reflects the

interests of the users. For example, for a designer whose task is to reduce fan noise, guidance on how

to minimise aerodynamic noise should be retrieved. On the other hand, if that designer is more

interested in using a specific method for noise reduction, then documents describing the methods

along with their advantages or disadvantages are more useful. With keyword-based indexing, it is not

feasible to extract such semantics since most natural language texts have annotations that are too basic

and no explicit descriptions of the concepts are available. Annotations are formal notes attached to

specific spans of text. Their complexity and representation depend on the mark-up language used.

The proposed method works as follows: (1) a document is annotated with a set of relations derived

from RST; (2) the document is classified with EDIT indexes; (3) the document is parsed using NLP

indexing techniques; (4) the RST-annotated document is converted into predicate-argument forms for

effective answer extraction; and (5) a user question is analysed using the same NLP technique. Steps

(1), (2), and (3) can proceed independently, but the step (1) must precede step (4).

4.1 Semantic annotations based on RST

Discourse Analysis (DA) is crucial for constructing computer systems capable of interpreting and

generating natural language texts. DA studies the structure of texts beyond sentence and clause levels,

and structures the information extracted from the texts with semantic relations. It is based on the idea

that well-formed texts exhibit some degree of coherence that can be demonstrated through discourse

connectivity, i.e. logical consistency and semantic continuity between events or concepts. This is in

contrast with most keyword-based indexing that exclusively addresses the sub-sentence level, omitting

the fact that sentences are inter-connected to create a whole text. In order to establish a more robust

and linguistically informed approach to identify important entities and their relations, a deeper

understanding is necessary.

Annotating a text with a discourse structure requires advanced text processing, linguistic resources

such as taxonomies, and, possibly, manual intervention by experts. It certainly increases the work

required to develop QA systems. However, if QA systems are only targeted at certain domains, where

a limited number of texts has to be searched, and experts are available to assist, then detailed linguistic

analysis is feasible. DA generates a discourse structure by defining discourse units, either at sentence

or clause level, and assigning discourse relations between the units. Discourse structures can reveal

various text features and attempts have been made to use them to identify important sentences that are

key to understanding the contents of documents (Kim et al., 2006b; Marcu, 1999). Discourse

structures can be used to compare units in multiple documents in order to evaluate similarities and

differences in their meanings, as well as to detect anomalies, duplications and contradictions.

Rhetorical relations are central constructs in RST and convey an author’s intended meaning by

presenting two text spans side by side. These relations are used to indicate why each of the spans was

included by the author and to identify the nucleus spans that are central to the purpose of the

communication. Satellite spans depend on the nucleus spans and provide supporting information.

Nucleus spans are comprehensible independently of the satellites. For example, consider the

following two text spans: (1) Given that the clutch was functional, and (2) it is unlikely that the engine

was driving the starter. A condition relation is identified, with span (2) being the nucleus. Satellite

span (1) is only used to define the condition in which the situation in span (2) occurs. These two spans

are coherent since the person who reads them can establish their relationship.

Rhetorical relations between spans are constrained in three ways: (1) constraints on a nucleus; (2)

constraints on a satellite; and (3) constraints on the link between a nucleus and a satellite. They are

elaborated in terms of the intended effect on the text reader. If an author presents an argument in a

text that is identified as an evidence relation, then it is clear that the author was intending to increase a

reader’s belief in the claim represented in a nucleus span by presenting supporting evidence in a

satellite span. Such relations are identified by applying a recursive procedure to a text until all

relevant units are represented in an RST structure (Taboada & Mann, 2006). The procedure has to be

recursive because the intended communication effect may need to be expressed in a complex unit that

includes other relations. The results of such analyses are RST structures typically represented as

trees, with one top-level relation encompassing other relations at lower levels.

It is difficult to determine the correct number of relations to be used and their types. In the simplest

domains only two relation types may be required, whereas some complex domains may require over

400 (Hovy, 1993). Hovy argued that taxonomies with numerous relation types represent sub-types of

taxonomies with fewer types. Some relation types are difficult to distinguish, e.g. elaboration and

example. If there are too many types, inconsistencies of annotation are likely. If there are too few, it

may not be possible to capture all the different types of discourse. Mann and Thompson (1988), for

example, listed 33 relation types to annotate a wide range of English texts. To reduce inconsistencies

of annotation, our method combines similar relation types and eliminates those that do not appear

frequently. A preliminary examination with sample engineering domain data from aircraft incident

reports (see Section 5.1) resulted in the following nine types: background, cause-effect, condition,

contrast, elaboration, evaluation, means, purpose, and solutionhood. Each of them is described

below, along with an example taken from the sample domain data, i.e. aircraft incident reports, using

(N) to indicate a nucleus span and (S) a satellite span.

Background

This type of relation is used to increase a reader’s background understanding (S) of the nucleus span

(N).

(S) While the helicopter was approximately 25 feet above ground level en route to Tobin Lake,

Saskatchewan, to pick up a bucket of water

(N) the engine fire warning light came on and the pilot saw smoke coming out of the engine

cowling.

Cause-Effect

This type of relation is used to link the cause in the nucleus span to the to the effect in the satellite

span or vice versa.

(N-S) Analysis of the fuel hose indicates that the steel braid strands failed

(S-N) as a result of chafing.

Condition

This type of relation is used to show the condition (S) under which a hypothetical situation (N) might

be realised.

(S) Given that the clutch was functional,

(N) it is unlikely that the engine was driving the starter.

Contrast

This type of relation is used to contrast incompatibilities between situations, opinions, or events and

there is no distinction between (N) and (S).

(N-S) It is considered likely that the fire was momentarily suppressed

(N-S) but because of the constant supply of fuel and ignition, it re-ignited after the retardant was

spent.

Elaboration

This type of relation is used to elaborate (S) on the situation in (N).

(N) The variable inlet guide vane actuator (VIGVA) hose, which provides fuel pressure to open

the variable guide vanes, was found pinched between the top of the starter/generator and the

impeller housing assembly.

(S) Further inspection of the pinched fuel hose revealed a hole through the steel braiding and

inner lining.

Evaluation

This type of relation is used to provide an evaluation (S) of the statement in (N).

(N) The second option, the engine start valve master switch,

(S) does not provide a positive indication to the flight crew of the start valve operation.

Means

This type of relation is used to explain the means (S) by which (N) is realised.

(N) The rest of the fire was extinguished

(S) using a fire truck that arrived on the site.

Purpose

This type of relation is used to describe the purpose (S) achieved through (N).

(N) At 13.5 flight hours prior to the occurrence, the starter/generator had been removed

(S) to accommodate the replacement of the starter/generator seal, then re-installed.

Solutionhood

This type of relation is used to link the problem (S) with the solution (N).

(N) The number 2 engine start control valve and starter were replaced, and the aircraft was

returned to service.

(S) It was determined that the number 2 engine starter had failed.

Figure 2 shows a screenshot of the RST annotation of the sample domain data. A software tool,

RSTTool, is used to complete the annotation (O’Donnell, 2000). RSTTool offers a graphical interface

with which annotators segment a given text into text spans and specify relation types between them. A

computer program written in Perl by the first author has been developed in order to automatically

extract the RST annotations stored by RSTTool. In the box at the bottom of Figure 2 can be seen the

decomposed text spans with the individual spans identified by square brackets. Above can be seen the

corresponding RST analysis tree.

[C-GRYC had been modified by the previous owner8 8, ] [Da Services Ltd 89,] [to incorporate a n engine start valve

master switch90]. [The modification was accepted by Transport C anada when the aircraft was imported into Canada

in 199291]. [The engine start valve master switch was put into the electrical circuit between the engine start switches

and the start valve cutout switches on the engine starter 92]. [It provides protection for the s tart circuit up to the start

valve cutout switch93].

Figure 2. Screenshot of RST analysis using RSTTool (O’Donnell, 2000)

It is common to use discourse connectives (or cue phrases) for automatic discovery of discourse

relations from texts. For example, by detecting the word but, a contrast relation between two adjacent

texts can be identified. This approach is easy to implement but can lead to a low coverage, i.e. the

ratio of correctly discovered discourse relations to the total number of discourse relations. A study by

Taboada and Mann (2006) showed that the levels of success using cue phrases ranged from 4% for the

‘summary’ relation to over 90% for the ‘concession’ relation. In order to improve the coverage,

machine learning methods have been used. Marcu et al. (2002) used Naive Bayesian probability to

generate lexical pairs that can identify relation types without relying on cue phrases. For example, the

approach can extract a ‘contrast’ when one text contains good words and another bad words, even

when but does not appear. Whereas this approach produces a good performance, the assumption that

the lexical pairs are independent of each other can lead to a considerable number of training sentences

being required, sometimes over 1 000 000. Although a low presence of cue phrases can lead to many

undiscovered relations, they can serve as a reference for annotators. Discourse text spans are inserted:

(1) at every period, semicolon, colon, or comma; and (2) at every cue phrase listed in Table 1.

Annotators first refer to the cue phrases to test whether the corresponding relation types can be used

for a given text. If no direct match is identified, then they select the closest one using their judgement.

Table 1 summarises cue phrases extracted from Knott and Dale (1995) and Williams and Reiter

(2003).

RST-annotated texts are converted into a predicate-argument structure, i.e. predicate (tag

:argument

,… tag

:argument

). Predicates represent the main verbs in sentences and tags include subjects,

objects and prepositions. For example, consider the following sentence: Analysis of the fuel hose

indicates that the steel braid strands failed as a result of chafing. For this sentence the ‘evidence’

relation type is used to annotate it as follows: evidence((indicate(subject:analysis of the fuel hose)),

(fail(subject:the steel braid strand, pp:as a result of chafing))).

Table 1. Cue phrases for identifying relation types

Relation types Cue phrases

Background With, probably

Cause-Effect Because, since, as, as a consequence, as a result, thus,

therefore, due to, lead to, consequently

Condition as long as, if…then, if, so long as, unless, until

Contrast although, by contrast, even though, however, though, whereas,

while

Elaboration also, in addition, in particular, for example, in general

Evaluation with, so, but, which, even so

Means by, with, using

Purpose in order to, for the purpose of

Solutionhood proposed solution, options

4.2 Semantic-based QA Description

4.2.1 Term indexing

In general, it is difficult to extract good index terms due to inherent ambiguity in natural language

texts. A term in a text, i.e. an alpha-numeric expression, can have different meanings depending on

the domain in which it is being used, and a term can appear more frequently in one domain than in

another. Publicly accessible dictionaries, e.g. WordNet (Miller et al., 1993) are good resources for

obtaining the meanings of the terms, both manually and automatically. For example, according to

WordNet, blade has nine meanings. One definition is: especially a leaf of grass or the broad portion

of a leaf as distinct from the petiole. However, another in the engineering domain is: flat surface that

rotates and pushes against air or water. Terms can also be used in different domains with the same

meaning. For example, certification does not have a different meaning in the engineering domain.

Most keyword-based search systems index a document with a list of keywords ranked with relevance

weightings. Whereas these keywords might be sufficient to describe superficially the contents of a

document, it is difficult to interpret the true message if their precise meanings are not established.

NLP, on the other hand, produces a rich representation of a document at a conceptual level. To

achieve human-like language processing, NLP includes a range of computational techniques for

analysing and representing natural texts at one or more levels of linguistic analysis (Liddy, 1998). It is

common to categorise such techniques into the six levels listed below, each of which has a different

analysis capability and implementation complexity (Allen, 1987). The application of NLP on a text

can be implemented at the simplest level, e.g. morphological level and then extended into a fully-

fledged pragmatic analysis that shows a superior understanding, but requires large resources and

extensive background information. In this paper, NLP processing includes the first five levels, i.e. it

excludes the pragmatic level.

Morphological level: component analysis of words, including prefixes, suffixes and roots, e.g.

using is stemmed into use.

Lexical level: word level analysis including a lexical meaning and a Part-Of-Speech (POS)

analysis, e.g. apple is a kind of fruit and is tagged as Noun.

Syntactic level: analysis of words in a sentence in order to determine the grammatical structure

of the sentence.

Semantic level: interpretation of the possible meanings of a sentence, including the

customisation of the meanings for given domains.

Discourse level: interpretation of the structure and the meaning conveyed from a group of

sentences.

Pragmatic level: understanding the purposeful use of language in situations particularly those

aspects of language which require world knowledge.

Figure 3 shows the steps of the indexing process. The text in the box at the bottom of Figure 2 is used

as an example.

Step 1: Pre-processing

Step 4: Term weighting

- Paragraph identification

- Sentence decomposition

- Term identifciation

- POS taggings

- Phrase identification

- Term normalisation

- Acronym identification

Step 2: Syntactic parse

Step 3: Lexical look-up

- Okapi method

Figure 3. Steps of the indexing process

Step 1: Pre-processing

One paragraph is identified in the example text, which is then decomposed into four sentences. The

first sentence is:

C-GRYC had been modified by the previous owner, DA Services Ltd, to incorporate an engine start

valve master switch.

Terms are identified as words lying between two spaces including full stop.

Step 2: Syntactic parse

The Apple Pie Parser (Sekine & Grishman, 2001) is used for a syntactic parse that tags POS and

identifies phrases. POS identifies not what a word is, but how it is used. It is useful to extract the

meanings of words since the same word can be used as a verb or a noun in a single sentence or in

different sentences. In a traditional grammar, POS classifies a word into eight categories: verb, noun,

adjective, adverb, conjunctive, pronoun, preposition and interjection. The Apple Pie Parser refers to

the grammars defined in the Penn Treebank to determine the POSs (Marcus et al., 1993). For

example, the first word C-GRYC is tagged as NNPX, i.e. proper single noun. The remain POSs for the

sentence above are shown below:

POS taggings: C-GRYC/NNPX had/VBD been/VBN modified/VBN by/IN the/DT previous/JJ

owner/NN DA/NNPX Services/NNPS Ltd/NNP to/TOINF incorporate/VB an/DT engine/NN start/NN

valve/NN master/NN switch/NN.

Phrase identification groups words grammatically, e.g. into Noun Phrases (NPs) such as { the previous

owner DA Services Ltd} and {an engine start valve master switch}.

Step 3: Lexical look-up

Each POS-tagged word is compared with WordNet definitions to achieve term normalisation.

Acronym identification extends an acronym found in a text fragment with its full definition. An

example of term normalisation is:

modified → modify

and of acronym identification is:

DA → Dan-Air.

Step 4: Term weighting

Although it is possible to analyse the full contents of a document, this becomes computationally

expensive when the documents are large. For an effective retrieval, it is desirable to extract only those

portions of a document that are useful and to transform them into special formats. Text indexing

determines the central properties of the content of a text in order to differentiate relevant portions of

text from irrelevant ones. The quality of each index term is evaluated to determine if it is an effective

identifier of the text content. A relative importance weighting is then assigned to each index term. A

common approach is to index a document divided into paragraph-sized units. In this paper, the Okapi

algorithm is used (Franz & Roukos, 1994; Robertson et al., 1995). It weights (

) a term (

) in a

paragraph (

) as follows:

 

5.0

5.0log

*5.15.0 





n

lenave

Plen

Equation (1)

where,

is the frequency of the term (

is the total number of paragraphs in a dataset and

is the number of paragraphs that have contents containing the term (

)( j

plen

is the total

number of frequencies of all terms presented in a paragraph (

) and

ave len_

is the average number

of terms per paragraph. Using the term weighting method, the example sentence is stored into a vector

model, i.e. each term is associated with its calculated weighting.

4.2.2 Domain knowledge in QA

An engineering taxonomy such as EDIT is a useful means to identify domain-specific terms in a

document. The successful extraction of domain-specific terms can improve the accuracy of QA. For

example the answer to the question: What material should be used for this turbine blade is more easily

identified if Titanium is marked-up as a type of material. Among the four root concepts defined in

EDIT, only two are used: Issue and Product. These two concepts exhibit different characteristics.

According to Ahmed (Ahmed, 2005), issue categories are considerations designers must take into

account when carrying out a design process. These can be the descriptions of problems arising during

a product’s lifecycle or new design requirements to be satisfied. In contrast, product categories

comprise a hierarchy list of product names, decomposing an overall technical product or system into

smaller and smaller elements. Different techniques are therefore needed to handle them in the

documents. For issue categories, any technique that automatically classifies a document into pre-

defined categories is suitable. For product categories, the technique of Named-Entity (NE)

recognition is used. In the QA method proposed in this paper, the techniques developed by Kim et al.

(2006a; 2006b) are used. The technique for the classifying issue categories is described in (Kim et al.,

2006b) and the one for classifying product categories, using probability-based NE identifiers, is

described in (Kim et al., 2006a).

4.2.3 QA overview

Figure 4 shows the overall architecture of the proposed QA system. State-of-the-art QA systems can

achieve an accuracy of up to 80%, as demonstrated by recent tests undertaken using TREC datasets,

which mainly consist of newspaper documents (Voorhees, 2002). However, this level of performance

is not expected to be repeated in other environments. The questions in the above tests were carefully

constructed, i.e. no misspellings, and they were mostly factual and based on a single interaction with a

user, i.e. no dialogue.

The prototype system proposed in this paper does not aim at achieving better accuracy in question

analysis or in finding answers. Instead, its main objective is to demonstrate the efficiency of RST-

based annotations for coherent answer generation, i.e. Answer Generation, see step (5) in Figure 4.

Figure 4. Overall architecture of the proposed QA system

Each of the steps in Figure 4 will now be described.

Step 1: Question Analysis

The Question Analysis Module decomposes a question into three parts: (1) Question Word; (2)

Question Focus; and (3) Question Attribute. The Question Word indicates a potential answer type,

e.g. where, when, etc. The Question Focus is a word, or a sequence of words, that describe the user’s

information needs that are expressed in the question. The Question Attribute is the part of the question

that remains after removing the Question Word and the Question Focus. It is used to rank candidate

answers in a decreasing order of relevance. An example is given below.

Question: What were the consequences of the vibration of the starter/generator?

(1.1) Syntactic parse

starter|NN /|SYM generator|NN

Phrase identification: NPL{the consequences} PP{the vibration of the starter/generator}

(1.2) Question Word {what}, Question Focus {consequences}, Question Attribute {the vibration of

the starter/generator}

(1.3) EDIT indexes: <Product category=‘Starter_Ducting’> starter <Product

category=‘Electrical_Generator’> generator <Issue category=‘Vibrations’> vibration

(1.4) Relation type: effect

(1.5) Answer format: cause-effect(Question Attribute, <Answer>)

The Question Focus, i.e. consequences, for the example question above is matched with the effect in

the cause-effect relation type. Therefore, the possible answers should be the effects of the events

described in the Question Attribute. For an automatic matching, a semantic similarity between the

Question Focus and the relation type is computed using the method proposed by Resnik (1995). This

method is based on the number of the edges in a semantic hierarchy, e.g. WordNet, encounted between

two terms when locating them in the hierarchy.

Step 2: Answer Retrieval

The Answer Retrieval Module uses the Question Attribute identified by the Question Analysis Module

to select paragraphs that might contain candidate answers. A cosine-based similarity calculation is

used for ranking the selected paragraphs in order of relevance to the keywords that appear in the

Question Attribute.

 









t

pqsim

Equation (2)

where,

is a given paragraph and

is a Question Attribute.

is the weight of the term,

, in

the paragraph (

). The similarity value is normalised by the total weights of common words.

Step 3: Answer Extraction and Step 4: Answer Scoring

The Answer Extraction Module examines the paragraphs selected in the Answer Retrieval Module in

order to select text spans from which candidate answers can be extracted. The EDIT indexes,

Question Focus, and Question Attribute are used to determine whether the text spans contain the

answer. In doing so, it is necessary to measure how well they are related to the question. An overall

similarity between the text spans and the question is computed by summing following three similarity

scores: (1) the score reflecting whether a given text span is classified with the EDIT indexes; (2) the

score reflecting whether a given text span contains the Question Focus; and (3) the score reflecting the

degree of similarity between a given text span and the Question Attribute. They are summed as

follows:

).(_*))(1()(_*)(_*)( iiiij tsattrstsrststseditstss





Equation (3)

where

)( ij

tss

is the score of the text span,

in the paragraph (

)10( 



and

)10( 



are used to normalise the score

)( ij

tss

to lie be between 0 and 1.

)(_ itsedits

is defined as:

Nindstsedits

positionsk

kii 



)()(_

Equation (4)

where

)( ki

inds

is a Boolean indicating whether or not a given text span is classified with an EDIT

index number,

ind

, positions is the set of matches against the EDIT indexes returned by the

Question Analysis Module, and N is the total number of elements in this set.

)(_ itsrsts

is a Boolean variable that is true if the RST annotation for the text span matches the

annotation for the question.

)(_ itsattrs

is defined as:

Mtstsattrs

i



)()(_

Equation (5)

where

)( mi

is a Boolean indicating whether or not a given term,

in the text span,

matched with a term in the Question Attribute,

, and M is the total number of terms in the Question

Attribute. The scored text spans are sorted in decreasing order by value and those above a pre-defined

threshold selected.

Step 5: Answer Generation

For coherent answer generation, duplicate sentences or clauses should be removed. Two sentences or

clauses are recurrent if they are exactly equivalent or if they differ only in the level of generality. For

example, the following two clauses are equivalent: the starter had failed and the failure of the number

2 engine starter. On the other hand, sentence 1 below can be replaced by sentence 2 without loss of

information because sentence 2 subsumes the information in sentence 1.

Sentence 1: It is probable that a short circuit in the engine wiring harness allowed the number 2

engine start valve to re-open, causing the number 2 engine starter to over speed and subsequently fail.

Sentence 2: It is probable that a short circuit in the engine wiring harness allowed the number 2

engine start valve to re-open, causing the number 2 engine starter to over speed and subsequently fail,

resulting in an engine fire.

Automatic text summarisation systems employ various approaches to compare similar sentences that

have different wordings (Mani & Maybury, 1999). In general, these systems use the following two

steps to produce summaries from a document:

Step 1 identifies and extracts important sentences to be included in a summary.

Step 2 synthesises the extracted sentences to form a summary.

There are two common methods of synthesis in step 2. The non-extractive summary method

suppresses repeated sentences either by extracting a subset of the repetitions or by selecting common

terms. It then reformulates the reduced number of sentences to produce the summary (Barzilay et al.,

1999; Jing & McKeown, 2000). The extractive summary method focuses on the extraction of

important sentences and assembles them sequentially to produce the summary. The objective of these

automatic summarisation systems is to create a shortened version of an original text in order to reduce

the time spent reading and comprehending it. The objective of our proposed approach, on the other

hand, is to extend and synthesize text spans to allow the generation of coherent answers.

In the proposed approach, two text spans are compared to determine whether or not both are similar

using Equation (2). Text spans that have higher similarity values than the pre-defined threshold are

excluded. The algorithm of the Answer Generation Module is shown below as pseudo-code.

Variable definition:

answerList: chains of ‘cause and effect’ to be generated by the Answer Generation Module

textspanList: a list of text spans returned by the Answer Scoring Module

relationlinkedList: a list of text spans linked through the ‘cause and effect’ relation obtained

from the RST annotations.

t: one text span being examined and extracted from the textspanList

thresh: a pre-defined threshold used to compare similarity between two text spans

t1: temp variable, t2: temp variable

Initialisation:

answerList ← empty

Repeat (

t← retrieve(textspanList), thresh  0.8, t1empty, t2 ← empty.

If (answerList does not contain t)

THEN (build chains of ‘cause and effect’ for t and merge them with answerList

{

foreach t1 relationlinkedList(t)

{

foreach t2 ← answerList

{ compute similarity between t1 and t2

IF (similarity > thres) { NOTHING }

ELSE { update(answerList,t1) }

}

ELSE { NOTHING}

remove(textspansList,t)

t←retrieve(textspansList)

)

Until (textspanList is EMPTY)

5 Pilot study

This section presents a preliminary evaluation of the prototype QA system. The evaluation tests the

following two hypotheses: (1) the proposed system is efficient at extracting and presenting answers to

causal questions using relation types, and (2) the presentation of the synthesised answers helps users to

understand the retrieved results. The first hypothesis was evaluated by comparing the performance of

the prototype system against that of a standard QA system. The standard system is different from the

prototype system in the following ways:

Step 1: the Answer Format is revised as <Answer> Question Focus Question Attribute. Th is format

indicates that the system should find text spans that have syntactic variations with the Question Focus

and semantic similarities with the Question Attribute. Using the example question in Section 4.2.3, a

potential answer text span can be <Answer> is the consequence of <Question Attribute>.

Step 2: is not revised.

Step 3 and Step 4: the Equation (3) is revised as:

).(_)( iij tsattrstss 

Step 5: is not used.

With the standard system, multiple instances were extracted without synthesising them. This

comparison examined whether the answer generation method described in Section 4.2 avoids repeated

information and generates coherent answers. The second hypothesis was evaluated by measuring user

performance in a simple trial.

5.1 Trial dataset

For the trial, three official aircraft incident investigation reports were downloaded from websites 5.

Although the incidents happened to three different aircraft types (Boeing 727-217, Bell 205A-1

helicopter, and Boeing 737-8AS), the incidents share a common cause, i.e. an in-flight engine fire.

Although the reports were written by different incident investigation teams, they share a broadly

similar terminology, e.g. emergency landing, engine starter valve, etc. After removing embedded

HTML tags and images, the average document length was 1820 words, or 24 paragraphs. Each

document was first indexed as described in Section 4.2.1, and RST annotations were applied by the

first author using RSTTool. A total of 1 94 relations were annotated. The total numbers for each

relation type were: Background = 20, Cause-Effect= 45, Condition = 16, Contrast = 30, Elaboration =

32, Evaluation = 24, Means = 10, Purpose = 12, and Solutionhood = 5.

5.2 The trial

Six Engineering graduate students and two members of the Engineering Department staff of

Cambridge University participated in the trial. A brief introduction to the trial and trial dataset was

given to the participants. Each participant was asked to answer multiple questions and their

performance and accuracy were measured. The trial consisted of reviewing the answers to three

questions, i.e. one for each incident report. These answers were split into two groups. The first group

was extracted and synthesized using the prototype QA system and the second was extracted using a

standard QA system. For a fixed period of time, the participants were instructed to read the answers

on-line for both systems (see Figure 5), and follow links to the original document if desired. After

this, a further list of questions related to the answers the users had just read was given to the users.

5 (1) http://www.tsb.gc.ca/en/reports/air/1996/a96o0125/ a96o0125.asp,

(2) http://www.tsb.gc.ca/en/reports/air/2002/A02C0114/A02C0114.asp,

(3) http://www.aaib.gov.uk/sites/aaib/cms_resources/dft_avsafety_pdf_029538.pdf

Their answers to these questions were used to test their understanding of the answers they had just

seen. In order to avoid the evaluation problems caused by the inclusion of incorrect answers, both

groups of answers were examined in order to verify that they were all true.

Figure 5. An example screenshot of the proposed system

Table 2 shows the three initial questions, along with the associated questions that were used to test the

users’ understanding.

Table 2: The three original questions along with their associated questions

Question 1: What triggered the engine fire alarm on the Boeing 727-217?

1. On which engine of the Boeing 727-217 was the fire alarm observed?

2. What were the consequences of the failure of the number 2 engine starter?

3. Why did the number 2 engine starter overspeed?

4. Did the starter valve of the number 2 engine close after the engine was started?

5. Why did the number 2 engine starter valve re-open?

6. How can we determine if the starter valve is open?

Question 2: What triggered the engine fire alarm on the Bell 205A-1 helicopter?

1. Why did the starter/generator start to vibrate?

2. What were the consequences the vibration of the start/generator?

3. Why was the hold-down nut at the 12 o’clock position left-out?

4. Was the engine fire alarm activated due to the abrasion of the cooling fan?

Question 3: What triggered the shut-down of the engine 2 on the Boeing 737-8AS?

1. Does this incident have the same engineering problem to the Boeing 727-217?

2. Did the failure of No 4 bearing in the number 2 engine contribute to the event?

3. What were the consequences of presence of the engine vibration?

5.3 Trial results

Answers to the three original questions shown in Table 2 were extracted and synthesized using the

method described in Section 4.2. The answers then were compared to the set of answers prepared in

Section 5.2. The threshold for the Answer Retrieval Module, i.e. the value for Equation (2), was set as

0.5 meaning that the paragraphs which had cosine-similarity values over 0.5 were selected. The

values for



and



specified in Equation (3) were set as 0.3 and 0.2, respectively. The threshold for

the Answer Scoring Module, i.e. the values for Equation (3) was set as 0.2.

The results are shown using two tables, i.e. Table 3 and Table 4. Table 3 summarises the results of the

‘cause and effect’ chains generated by the proposed QA. Table 4 compares the performance of the

proposed QA on retrieving correct text spans with that of the standard QA. Precision and recall were

used to measure the performance. In this paper, precision is defined as the percentage of the retrieved

text spans that are identified as right among the total number of retrieved text spans. Recall is the

percentage of the retrieved right text spans among the total number of right text spans.

Table 3. The overview of ‘cause and effect’ chains generated by the proposed QA

Num. of paragraphs Num. of text spans Depth of the chains

Question 1 17 12 4

Question 2 8 9 4

Question 3 11 6 2

The second column in Table 3 specifies the number of paragraphs returned by the Answer Retrieval

Module and the third one specifies the number of text spans returned by the Answer Extraction and

Scoring Modules. The fourth one specifies the depth of the chains, i.e. the number of cause and effect

nodes along the longest path from the root node down to the farthest leaf node in the chains. For

example, for the Question 1, one ‘cause and effect’ chain with a depth of 4 was generated by

synthesising and extending the 12 text spans extracted from the 17 paragraphs.

The following shows examples of correct and incorrect text spans for the Question 1.

Correct text spans:

(1) The failure of the number 2 engine starter resulted in an engine fire.

(2) The hazard associated with an engine fire caused by a starter failure was recognized and

addressed in AWD 83-01-05 R2.

(3) It is probable that a short circuit in the engine wiring harness allowed the number 2 engine start

valve to re-open, causing the number 2 engine starter to over speed and subsequently fail, resulting

in an engine fire.

Incorrect text spans:

(1) Because of the engine’s proximity to the elevator and rudder control systems, a severe in-flight fire

in the number 2 engine is potentially more serious than a fire in either the number 1 or 3 engine.

(2) Fire damage to the engine component wiring precluded any significant testing of the wiring

harness.

(3) Two fire bottles were discharged into the number 2 engine compartment; however, the fire

warning light remained on.

Table 4. Comparison of two QA systems for the task of retrieving correct text spans

STANDARD QA PROPOSED QA

Precision Recall Precision Recall

Question 1 0.45 0.5 0.67 1

Question 2 0.6 0.67 0.78 0.78

Question 3 0.67 0.57 0.83 0.79

Average 0.57 0.58 0.76 0.86

As shown in Table 4, on average, the proposed QA achieved 76% precision and 86% recall when

retrieving text spans for three questions. On the other hand, the standard QA achieved 57% precision

and 58% recall. This suggests that the proposed QA has considerable potential for extracting and

synthesizing answers to causal questions. The task of retrieving text spans is similar to the sentence

selection task in automatic text summarisation systems.

The text summarisation systems referred to earlier in this paper are by Barzilay et al. (1999) and by

Jing and McKeown (2000). In the context of multi-document summarisation, Barzilay et al. (1999)

focused on the generation of paraphrasing rules that were used to compare semantic similarity between

two sentences. They tested the rules for the task of identifying common phrases among multiple

sentences. The automatically generated common phrases were then reviewed by human judges. The

reviews identified 39 common phrases and among them the system correctly identified 29 of them. In

addition the identified phrases contained 69% of the correct subjects and 74% of the correct main verbs.

On average, the system achieved 72% accuracy.

Jing and McKeown (2000) carried out three evaluations. The first tested whether the automatic

summarisation system could identify a phrase in the original text that corresponds to the selected

phrase in a human-written abstract. When tested with 10 documents, the automatic system achieved

82% precision and 79% recall on average. The second evaluation tested whether the automatic system

could remove extraneous sentences, i.e. sentence reduction. The result showed that 81% of the

reduction decisions made by the system agreed with those of humans. The third evaluation tested

whether the automatic system could generate coherent summaries. The system achieved 6.1 points out

of 10, i.e. 61% accuracy for generating coherent summaries. Only the first evaluation focused on the

sentence selection.

The performance of our proposed QA when retrieving correct text spans with 76% of precision and

86% recall is slightly better than the work by Barzilay et al. (1999), i.e. 72% accuracy, and comparable

to the work by Jing & McKeown (2000), i.e. 82% precision and 79% recall.

On average, out of 13 questions, the users in the first group, i.e. those who read the answers given by

the proposed QA, incorrectly answered two out of the 13 questions, whereas the users in the second

group incorrectly answered five questions. On average, the users in the first group completed the trial

within 19 minutes, and the users in the second group completed the trial within 25 minutes. Five of

the 13 questions were correctly answered by all the users in the first group, whereas just one question

was correctly answered by all the users in the second group. All users in the second group incorrectly

answered question number 6 ‘how can we determine if the starter valve is open’.

Although the preliminary results are encouraging, it is difficult to draw firm conclusions from this trial

for the following reasons: (1) the low number of users in the two groups; and (2) the number of causal

relations in the trial dataset was small. Users in the first group expressed the opinion that the

synthesized chains of ‘cause and effect’ description were helpful in understanding the causes of the

three incidents.

6 Conclusion and further work

Researchers in computational linguistics have speculated that the relation types defined in RST can

improve the performance of QA systems when answering complex questions. The class of causal

reasoning questions, either predictive or diagnostic, is one that we have shown might be better

answered using these relation types. The reason for this is that the majority of causal questions can be

answered in multiple ways, i.e. it is difficult to pinpoint particular causes and regard them as

independent of the remaining information. Generally, identifying the causes of a specific event

involves creating chains of ‘cause and effect’ relations. Without a deep understanding of all the

relevant information contained in a document, it is not possible to derive such causal chains

automatically. It is still not known how users would like such causal chains to be presented, and it is

not suggested that the interface proposed in this paper is necessarily the best. The contribution of this

paper is the demonstration of a method for synthesizing causal information into coherent answers.

The source information can be scattered over different parts of a single document or over multiple

documents. The pilot study indicated that the proposed QA was more efficient at extracting and

synthesizing answers when compared with standard QA, i.e. 19 percentage points increased precision

and 28% percentage points increased recall. The pilot study also indicated that the synthesized chains

of ‘cause and effect’ descriptions were helpful not only for quickly understanding the direct causes of

the three incidents but also for being aware of related contexts along with the rationales for the causes

of the incidents.

The main objective is to improve the understanding of the answers generated by QA systems. An

answer is considered to be coherent if duplicate expressions are eliminated and if it is appropriately

extended with additional information. This additional information should help users verify the

answers and increase their awareness of relevant domain information. Using RST annotations, it has

been shown that it is feasible to compare and integrate the information at a semantic level. This leads

to a way of presenting answers in a more natural manner. A pilot trial demonstrated that the answers

generated by the prototype QA system led to more rapid and improved understanding of those

answers.

Further work is planned with the aim of improving the performance of the prototype system in three

ways. First, since engineers have varying levels of domain expertise, the system should consider the

preferences and profiles of individuals. Inexperienced engineers might have very broad information

requests and prefer to explore the domain, whereas experienced engineers might have detailed

information requests aimed at refining their existing knowledge. Novice engineers require more

background information, probably assembled using ‘elaboration’ or ‘background’ relation types.

Second, synthesising sentences extracted from different documents is crucial to generate answers that

are longer than one sentence. When writing a sequence of linked sentences, authors often replace

noun phrases by pronouns, or shortened forms of the phrase, in subsequent sentences, e.g. the number

2 engine starter is replaced by it or the starter. Coreference (anaphora) resolution is a process for

determining the multiple representations of a noun phrase and is a key issue in computational sentence

synthesis. However the main focus of research in this area has been on the resolution of personal

pronouns e.g. he, him, his etc. Various techniques have been proposed for automatic coreference

identification and it is planned to extend the prototype QA system by adapting these techniques.

Third, the crucial issue of automatic RST annotation will be addressed since this is essential for the

practical application of the system. Kim et al. (2004) have applied a machine learning algorithm, i.e.

Inductive Logic Programming (ILP), to analyse documents created using the Design Rationale editor

(DRed). This enabled the automatic identification of the relation types (Bracewell & Wallace, 2003,

Bracewell et al., 2004). Tests have demonstrated approximately 80% accuracy. This high figure can

be attributed partly to the structure of the DRed documents in the dataset. These documents are

carefully structured using an argumentation model derived from that of IBIS (Kunz & Rittel, 1970).

The documents comprise linked textual elements of a predefined set of types. These element types

include ‘issue’, ‘answer’ and ‘argument’. The links between them are directed but untyped. This

algorithm will be extended to deal with other types of documents, e.g. Web pages and unstructured

texts.

The main objective of this research was to answer more complex questions than current QA systems

are capable of answering. There are five modules in the architecture of the proposed QA system:

Question Analysis; Answer Retrieval; Answer Extraction; Answer Score; and Answer Generation.

The Question Analysis Module analyses the question in terms of the Question Word, Question Focus

and Question Attribute. The next three modules retrieve, extract and score answers from documents

that have been manually annotated and semi-manually indexed. The manual annotation is based on

nine of the 33 relation types defined in RST. The semi-manual indexing uses the issue and product

categories of the EDIT engineering taxonomy. The main contribution of this research lies in the fifth

module. This module synthesises causal information into coherent answers, drawing information from

both different parts of a single document and from multiple documents. A prototype implementation

shows promise, but additional testing is required. Further developments are proposed that will: (1)

allow the system to take into account the preferences and profiles of users; (2) extend the system to

include coreference identification; and (3) eliminate the manual annotation of documents. As with all

computer support systems, the interface is critical and here further empirical research is needed.

Acknowledgements

This work was funded by the University Technology Partnership for Design, which is a collaboration

between Rolls-Royce, BAE SYSTEMS and the Universities of Cambridge, Sheffield and

Southampton. We also thank S. Banerjee and T. Pedensen for software implementing the similarity

measurement method proposed by Resnik.

References

80-20 software. (2003). 80-20 Retriever Enterprise Edition.http://www.80-20.com/brochures/Personal

Email Search Solution.pdf

Ahmed, S. (2005). Encouraging Reuse of Design Knowledge: A Method to Index Knowledge. Design

Studies Journal, 26(6), 565-592.

Ahmed, S., Kim, S., & Wallace, K. M. (2005). A methodology for creating ontologies for engineering

design. Proc. ASME 2005 Int. Design Engineering Technical Conf. on Computers and Information in

Engineering, DETC 2005-84729. U.S.A.

Allen, J. (1987). Natural Language Understanding. Benjamin/Cummings Publishing Company, Inc.

Aunimo, L., & Kuuskoski, R. (2005). Question Answering Using Semantic Annotation. Proc. Cross

Language Evaluation Forum (CLEF). Austria.

Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information Fusion in the Context of Multi-

Document Summarization. Proc. Annual Computational Language, pp. 550-557. U.S.A.

Bosma, W. (2005). Extending Answers using Discourse Structure. Proc. Workshop on Crossing

Barriers in Text Summarization Research in RANLP, pp. 2-9. Bulgaria.

Bracewell, R. H., Ahmed, S., & Wallace, K. M. (2004). DRed and design folders: a way of caputuring,

storing and passing on knowledge generated during design projects. Proc. Design Automation Conf.,

ASME. USA.

Bracewell, R.H., & Wallace, K. M. (2003). A tool for capturing design rationale . Proc. 14th Int. Conf.

on Engineering Design, pp. 185-186. Stockholm.

Brill, E., Lin, J., Banko, M., Dumais, S. T., & Ng, A.Y. (2001). Data-intensive question answering.

Proc. Tenth Text REtrieval Conf. (TREC 2001), pp. 183-189. U.S.A.

Burger, J., Cardie, C., Chaudhri, V., Gaizauskas, R. and et al. (2001). Issues, Tasks and Program

Structures to Roadmap research in Question & Answering (QA), NIST

Burstein, J., Marcu, D., & Knight, K. (2003). Finding the write stuff: Automatic identification of

discourse structure in student essays. IEEE Intelligent Systems, Jan/Feb, 32-39.

Diekema, A. R., Yilmazel, O., Chen, J., Harwell, S., He, L., & Liddy. E. D. (2004) . Finding Answers

to Complex Questions. In New Directions in Question Answering (Maybury, M. T., Ed), pp. 141-152.

AAAI-MIT Press.

Franz, M., & Roukos, S. (1994) TREC-6 Ad-Hoc Retrieval, Proc. the Sixth Text Retrieval Conf.

(TREC-6), pp.511-516.

Hai, D., & Kosseim, L. (2004). The Problem of Precision in Restricted-Domain Question-Answering:

Some Proposed Methods of Improvement. Proc. Workshop on Question Answering in Restricted

Domains in ACL, pp. 8-15. Barcelona.

Hickl, A., Lehmann, J., Williams, J., & Harabagiu, S. (2004). Experiments with Interactive Question

Answering in Complex Scenarios. Proc. North American Chapter of the Association for

Computational Linguistics annual meeting (HLT-NAACL), U.S.A.

Hovy, E. H. (1993). Automated Discourse Generation Using Discourse Structure Relations. Artificial

Intelligence, 63(1-2), 341-385.

Jing, H., & McKeown, K. R. (2000). Cut and Paste Based Text Summarization. Proc. 1st Meeting of

the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp.

178-185. U.S.A.

Kim, S., Bracewell, R.H., & Wallace, K.M. (2004). From discourse analysis to answering design

questions. Proc. Int. Workshop on the Application of Language and Semantic Technologies to support

Knowledge Management Processes, pp. 43-49. U.K.

Kim, S., Ahmed, S., & Wallace, K. M. (2006a). Improving document accessibility through ontology-

based information sharing. Proc. Int. Symposium series on Tools and Methods of Competitive

Engineering, pp. 923-934. Slovenia.

Kim, S., Bracewell, R.H., Ahmed, S., & Wallace, K. M. (2006b). Semantic Annotation to Support

Automatic Taxonomy Classification. Proc. Int. Design Conference (Design 2006), pp. 1171-1178.

Croatia.

Knott, A., & Dale, R. (1995). Using linguistic phenomena to motivate a set of coherence relations.

Discourse Processes, 18(1), 35-62.

Kunz, W., & Rittel, H. W. J. (1970). Issues as Elements of Information Systems. Working Paper 131.

Center for Planning and Development Research, Berkeley, USA. Elsevier Scientific Publishing

Company, 55-169, Inc. Amsterdam.

Kwok, C. Etzioni, O., & Weld, D. S. (2001). Scaling Question Answering to the Web. Proc. of the

10th Int. Conf. on World Wide Web, pp. 150-161. Hong Kong

Liddy, E. D. (1998). Enhanced Text Retrieval Using Natural Language Processing. Bulletin of the

American Society for Information Science and Technology, 24(4), 14-16.

Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., & Karger, D. R. (2003). What Makes a

Good Answer? The Role of Context in Question Answering. Proc. of the IFIP TC13 Ninth Int. Conf.

On Human-Computer Interaction, Switzerland.

Lopez, V., Pasin, M., & Motta, E. (2005). AquaLog: An Ontology-Portable Question Answering

System for the Semantic Web. Proc. of the Second European Semantic Web Conference (ESWC), pp.

546-562. Greece.

Mani, I., & Maybury, M. (1999). Advances in Automatic Text Summarisation. The MIT Press.

Mann, W., & Thompson, S. (1988). Rhetorical structure theory: Toward a functional theory of text

organization. Text, 8(3), 243-281

Marcu, D., & Echihabi, A. (2002). An unsupervised approach to recognising discourse relations. Proc.

the 40th Annual Meeting of the Association for Computational Linguistics, pp. 368-375. U.S.A.

Marcu, D. (1999). Discourse trees are good indicators of importance in text. In Advances in Automatic

Text Summarization (Mani, I., Maybury, M., Eds.), MIT Press

Marsh, J.R., & Wallace, K. (1997). Observations on the Role of Design Experience. Proc. WDK

Annual Workshop, Switzerland.

Miller, G.A., Beckwith, R.W., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to wordnet:

An on-line lexical database. International Journal of Lexicography, 3(4), 235-312.

Mukherjee, R., & Mao, J. (2004). Enterprise Search Tough Stuff ACM Queue, 2(2), 36-46

Nyberg, E., Mitamura, T., Frederking, R., Pedro, V., Bilotti, M., Schlaikjer, A., & Hannan, K. (200 5).

Extending the JAVELIN QA System with Domain Semantics. Proc. of the Workshop on Question

Answering in Restricted Domains at AAAI, U.S.A.

O'Donnell, M., (2000). RSTTool 2.4 -- A Markup Tool for Rhetorical Structure Theory . Proc. of the

Int. Natural Language Generation Conference (INLG'2000), pp. 253-256. Israel.

Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in Taxonomy. Proc.

14th Int. Joint Conf. on Artificial Intelligence, pp. 448-453.

Robertson, S. E., Walker, S., Jones, S., & Hancock-Beaulieu, M. G. (1995) Okapi at TREC-3. Proc. of

the Third Text REtreval Conference (TREC-3), pp. 550-225.

Salton, G. (1989). Advanced Information-Retrieval Models. In Automatic Text Processing (Salton, G.

Ed.), chapter 10. Addison-Wesley Publishing Company.

Sekine S., & Grishman R. (2001). A Corpus-Based Probabilistic Grammar with only two Non-

Terminals, Proc. Fourth Int. Workshop on Parsing Technologies, pp. 216-223. Czech Republic.

Taboada, M., & Mann. W. (2006). Rhetorical Structure Theory: Looking back and Moving ahead

Discourse Studies, 8(3) (to appear)

Teufel, S. (2001). Task-based evaluation of summary quality: Describing relationships between

scientific papers. Proc. Int. Workshop on Automatic Summarization at NAACL, U.S.A.

Voorhess, E. M. (2002). Overview of the TREC 2002 Question Answering Track. Proc. of the Text

Retrieval Conference. (TREC).

Williams, S., & Reiter, E. (2003). A corpus analysis of discourse relations for natural language

generation. Proc. of Corpus Linguistics, pp. 899-908, U.K.

A collaborative system for capturing and reusing in-context design knowledge with an integrated representation model

Article

Jan 2017
ADV ENG INFORM

Current research on design knowledge capture and reuse has predominantly focused on either the codification view of knowledge or the personalisation view of knowledge, resulting in a failure to address designers’ knowledge needs caused by a lack of context of information and insufficient computational support. Precisely motivated by this gap, this work aims to address the integration of these two views into a complete, contextual and trustworthy knowledge management scheme enabled by the emerging collaborative technologies. Specifically, a knowledge model is developed to represent an integrated knowledge space, which can combine geometric model, knowledge-based analysis codes and problem-solving strategies and processes. On this basis, a smart collaborative system is also designed and developed to streamline the design process as well as to facilitate knowledge capture, retrieval and reuse as users with different roles are working on various tasks within this process. An engineering case study is undertaken to demonstrate the idea of collaborative knowledge creation and sharing and evaluate the effectiveness of the knowledge representation model and the collaborative technologies employed. As evidenced in the development and evaluation, the methods proposed are effective for capturing an integrated knowledge space and the collaborative knowledge management system not only facilitates problem-solving using knowledge-based analysis but also supplies in-context tacit knowledge captured from the communications between users throughout the design process.

A hypernetwork-based approach to collaborative retrieval and reasoning of engineering design knowledge

Article

Oct 2019
ADV ENG INFORM

Complex product development increasingly entails creation and sharing of design knowledge in a collaborative and integrated working environment. In this context, it has become a central issue to address the multifaceted feature of design knowledge for such a collaborative knowledge sharing scheme. This paper proposes a hypernetwork-based approach to explicitly capturing the relationships between various elements in a multifaceted knowledge representation. Specifically, a knowledge hypernetwork model is constructed, which is composed of a designer network, a product network, an issue network and a knowledge unit network. The relationships between various nodes from different networks are identified and defined according to specific node properties. In addition, topological characteristics of the hypernetwork structure are analyzed together with the statistical indicators. Based on this model, the Bayesian approach is adopted to conduct the collaborative reasoning process whereby knowledge elements relevant to the current design task are recommended according to the issues to be resolved and the current design context. A case study conducted in this work shows that the proposed approach is effective in capturing the complex relationships between multi-faceted knowledge elements and enables collaborative retrieval and reasoning of knowledge records.

DRED 2.0: A method and tool for capture and communication of design knowledge deliberated in the creation of technical products

Conference Paper

Full-text available

Aug 2009

This paper addresses the general issue of software tool support for designers, helping them to structure, to communicate and to document activities of generation, evaluation and decision. Here the focus is on detailed consideration of desired and undesired behavioural relationships among elements of complex design artefacts and with end users. This is an area that has recently been under discussion by proponents and critics of Affordance Based Design methods. Our solution approach is to extend an existing graph based software tool for design rationale capture that has been in widespread use in an international aerospace company for several years. We are integrating its Issue Based Information System (IBIS) based design argumentation with hierarchical Functional Analysis Diagrams (FAD), a form of Concept Map. The resulting software is being tested by practical application on pilot projects in the company, and initial experiences have been very favourable. The new graph element types, bidirectional relationship types between graphs, and supporting navigational facilities are described. Their use is illustrated by example of an integrated hierarchical FAD and assembly geometry model of a gas turbine engine.

A Framework for Automatic Causality Extraction Using Semantic Similarity

Conference Paper

Full-text available

Jan 2007

Textual documents are the most common way of storing and distributing information within organizations. Extracting useful information from large text collections is therefore the goal of every organization that would like to take advantage of the experience encapsulated in those texts. Entering data using a free text style is easy, as it does not require any special training. However, unstructured texts pose a major challenge for automatic extraction and retrieval systems. Generally, deep levels of text analysis using advanced and complex linguistic processing are necessary that involve computational linguistic experts and domain experts. Linguistic experts are rare in engineering organizations, which thus find it difficult to apply and exploit such advanced extraction techniques. It is therefore desirable to minimize the extensive involvement of linguist experts by learning extraction patterns automatically from example texts. In doing so, the analysis of given texts is necessary in order to identify the scope and suitable automatic methods. Focusing on causality reasoning in the field of fault diagnosis, the results of experimenting with an automatic causality extraction method using shallow linguistic processing are presented.

Faceted Search and Retrieval Based on Semantically Annotated Product Family Ontology

Article

Feb 2009

With the advent of various services and applications of Semantic Web, semantic annotation had emerged as an important research area. The use of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands tremendous human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. This paper attempts to address this issue by proposing an information management and retrieval framework based on the semantically annotated product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished based on the semantically annotated camera family ontology. Lastly, we briefly discuss some further research and application in design decision support, e.g. commonality and variety, based on the semantically annotated product family ontology.

Multi-facet product information search and retrieval using semantically annotated product family ontology

Article

Jul 2010
INFORM PROCESS MANAG

Natural Language Processing in-and-for Design Research

Article

Full-text available

Aug 2022

We review the scholarly contributions that utilise natural language processing (NLP) techniques to support the design process. Using a heuristic approach, we gathered 223 articles that are published in 32 journals within the period 1991–present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions and others. Upon summarising and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.

Natural Language Processing in-and-for Design Research

Preprint

Full-text available

Nov 2021

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.

Bilateral lateral rectus recession considering the tendon width in intermittent exotropia

Article

Jun 2009

The tendon width of the lateral rectus muscle can be a useful indicator of the effect of unilateral lateral rectus recession in intermittent exotropia. The aim of this study was to determine whether the tendon width of lateral rectus would be useful for predicting the effect of bilateral lateral rectus (BLR) recession. We studied a total of 45 patients between 3 and 15 years of age who had undergone bilateral rectus recession for the basic type of intermittent exotropia. The actual effect of lateral recession was calculated by adding the absolute value of the angle of preoperative deviation and the postoperative deviation on the second day and dividing the figure by the total amount of recession. We then calculated the hypothetical effect of lateral rectus recession, considering the tendon width of each eye, and added the effects of both eyes. The hypothetical effects were defined as 3 PD when the tendon width was 8-8.5 mm; 3.5 PD when it was 7-7.5 mm; and 2.5 PD when it was 9-9.5 mm, based upon earlier statistical analysis. We compared both effects using a paired t-test. The mean difference between the actual and the hypothetical effects of BLR recession in all patients was 2.88 PD (P=0.001, range: 0-5.50 PD). However, when the amount of preoperative exodeviation was below 25 PD, the differences were not statistically significant (P=0.086). The tendon width may also be useful indicator in BLR recession if the preoperative exodeviation is below 25 PD.

Development of a Web-based Senescence Preparation Education Program for Successful Aging for Middle-aged Adults

Article

Full-text available

Dec 2008

Youngmi Jung

The purpose of this study was to develop a web-based senescence preparation education program to promote successful aging. This program was developed based on Network-Based Instructional System Design (NBISD) model, using the following 5 processes: analysis, design, development, implementation, and evaluation. The program was operated for 10 weeks from March 17 to May 25, 2008. There were 4 menu bars, introduction, related data, lecture room, and communication on the main page. In the operation of this program, HTML, ASP, JAVA Script, Namo web editor, Edit Plus, Front Page and multimedia technology were applied. The program content consisted of understanding elderly people, physical health, activity & exercise, nutrition, medication use, psychological health, intellectual health, understanding death, welfare system and leisure activity. This program could be a useful means to provide senescence preparation information to middle-aged adults. Also, it is expected to offer individualized learning opportunities to many learners in various settings. Nurses should further develop and facilitate various learning strategies including web- based programs for elder care.

DRed and Design Folders: A Way of Capturing, Storing and Passing On Knowledge Generated During Design Projects

Conference Paper

Full-text available

Sep 2004

This paper describes a software tool called DRed (the Design Rationale editor), that allows engineering designers to record their design rationale (DR) at the time of its generation and deliberation. DRed is one of many proposed derivatives of the venerable IBIS concept, but by contrast with other tools of this type, practicing designers appear surprisingly willing to use it. DRed allows the issues addressed, options considered, and associated arguments for and against, to be captured graphically. The software, despite still being essentially a research prototype, is already in use on high profile design projects in an international aerospace company, including the presentation of results of design work to external customers. The paper compares DRed with other IBIS-derived software tools, to explain how it addresses problems that seem to have made them unsuitable for routine use by designers. In addition to the capture and presentation of the DR itself, the set of linked DR graphs can be used to provide a map of the contents of an electronic Design Folder, containing all the documents created by an individual or team during a design project. The structure of the knowledge model instantiated in such a Design Folder is described. By reprising a design case study published at the DTM 2003 conference, concerning the design of a Mobile Arm Support (MAS), the DRed knowledge model is compared with the previously proposed Design Data Model (DDM), to show how it addresses the shortcomings identified in the DDM. Finally the methodology and results of the preliminary evaluation of the use of DRed by aerospace designers are presented.

IMPROVING DOCUMENT ACCESSIBILITY THROUGH ONTOLOGY-BASED INFORMATION SHARING

Article

Full-text available

Information plays a crucial role in the entire life cycle of a product, i.e. from conceptual design through use and maintenance to eventual disposal. Once key personnel move on from a company, vital know-how may only exist in archived product documents, leading to great interest in ways of extracting readily sharable information from them. Studies have shown that knowledge workers spend appreciable amounts of their productive time searching for information online, and that those searches often prove unsuccessful. In addition, most engineers have difficulty in formulating effective query terms and current keyword-based searches are inefficient for helping them to select better queries. It is commonly argued that this problem can be solved by giving explicit definitions to the information through an ontology. Ontology-based information sharing has the potential to reduce the effort expended by engineers on retrieval and reuse of existing information. This paper presents an approach that extracts domain concepts from the documents based on an ontology definition. In particular, the applications of information extraction and document enrichment through engineers' comments are discussed. The EDIT engineering ontology is described, and its role in improving document enrichment is presented.

From Discourse Analysis to Answering Design Questions

Conference Paper

Full-text available

Jan 2004

This paper discusses an approach of modelling design ratio- nal expressed in natural language sentences into a discourse model. The discourse model is used to classify captured rationale into discourse cat- egories by taking into account semantic and pragmatic relationships be- tween two design elements. It is expected that accessibility and reusabil- ity of captured rationale is improved since retrieval is supported within discourse contexts. A small dataset was collected to test whether selected discourse relations are extractable and whether a machine learning al- gorithm can generate rules under which appropriate relations can be automatically marked.

A Tool for Capturing Design Rationale

Conference Paper

Full-text available

Aug 2003

SEMANTIC ANNOTATION TO SUPPORT AUTOMATIC TAXONOMY CLASSIFICATION

Conference Paper

Full-text available

May 2006

The paper presents a new taxonomy classification method that generates classification criteria from a small number of important sentences identified through semantic annotations. Rhetorical Structure Theory (RST) is used to discover the semantics. The annotations identify which parts of a text are more important for understanding its contents. The extraction of salient sentences is a major issue in text summarisation. Statistical analysis is commonly used, but for subject-matter type texts, linguistically motivated natural language processing techniques, e.g. semantic annotations, are preferred. An experiment to test the method using documents collected from industry demonstrated that classification accuracy can be improved by up to 16%.

Building a large annotated corpus of english: The penn treebank

Article

Jan 1994

Task-Based Evaluation of Summary Quality: Describing Relationships between Scientific Papers

Article

Jan 2001

Simone Teufel

We present a novel method for task-based evalua- tion of summaries of scientific articles. The task we propose is a question-answering task, where the questions are about the relatedness of the current paper to prior research. This evaluation method is time-efficient with respect to material preparation and data collection, so that it is possible to test against many different baselines, something that is not usually feasible in evaluations by relevance decision. We use this methodology to evaluate the quality of summaries our system produces. These summaries are designed to describe the contribution of a scientific article in relation to other work. The re- sults show that this type of summary is indeed more useful than the baselines (random sentences, keyword lists and generic author-written summaries), and nearly as useful as the full texts.

A Methodology for Creating Ontologies for Engineering Design

Conference Paper

Jan 2005

This paper describes a methodology for developing ontologies for engineering design. The methodology combines a number of methods from social science and computer science, together with taxonomies developed in the field of engineering design. A case study is used throughout the paper focusing upon the use of an ontology for searching, indexing and retrieving of engineering knowledge. An ontology for indexing design knowledge can assist the users to formulate their queries when searching for engineering design knowledge. The root concepts of the ontology were elicited from engineering designers during an empirical research study. These formed individual taxonomies within the ontology and were validated through indexing a set of ninety-two documents. Relationships between concepts are extracted as the ontology is populated with instances. The identified root concepts were found to be complete and sufficient for the purpose of indexing. A thesaurus and an automatic classification are being developed as a result of this evaluation. The methodology employed during the test case is presented in this paper. There are six separate stages, which are presented together with the research methods employed for each stage and the evaluation of each stage. The main contribution of this research is the development of a methodology to allow researchers and industry to create ontologies for their particular purpose and to develop a thesaurus for the terms within the ontology. The methodology is based upon empirical research and hence, focuses upon understanding a user’s domain models as opposed to extracting an ontology from documentation.

AquaLog A Ontology-portable Question Answering interface for the Semantic Web

Conference Paper